This post evaluates the programming languages ATS, C#, Go, Haskell, OCaml, Python and Rust to try to decide which would be the best language in which to write 0install (which is currently implemented in Python). Hopefully it will also be interesting to anyone curious about these languages.
I'm not an expert in these languages (except Python). My test-case is to read the tutorial for each language and reimplement one trivial function of 0install in the language. These languages were suggested by various people on the 0install mailing list. If I've got anything wrong, please add a comment.
Table of Contents
- Why replace Python?
- Brief summary of the candidates
- Test case
- Speed and size
- Binary compatibility
- Safety
- Diagnostics
- Ease of writing
- Shared libraries
- Static types
- Bounds on privilege
- Mutability
- C interoperability
- Asynchronous code
- Summary
- Round 2
Why replace Python?
Several people have asked for a version of 0install in a compiled language:
- Tim Cuthbertson wants to build a statically-linked binary for bootstrapping on systems which don't already have 0install, to increase performance and to simplify installation on OS X. He has started prototyping a Haskell version.
- Canonical's Colin Watson is worried about Python's performance on mobile phones.
- Marco Jez and Dave Abrahams proposed a C++ version.
- Bastian Eicher would like a .NET version (though IronPython might work here).
- The Sugar developers would like maximum performance on their low-powered XO laptops.
Personally, I'd like to use a language with static type checking to make changes to the code less risky, and to detect problems due to API changes in the platform (e.g. Python 3 broke the code in subtle ways which we're still discovering occasionally in less-used code-paths).
However, Python has worked well for us over the years and has many benefits:
- Widely known and easy to learn.
- Clear, readable syntax.
- A large standard library.
- Easy to debug.
- Generators make asynchronous code easy.
- You only need to ship source code (interpreted).
- Can run inside a Java or .NET VM (using Jython/IronPython).
- Can support multiple versions of library APIs (
if hasattr(...)
etc). - All current 0install contributors know it.
- The current code is all Python and is well-tested.
Brief summary of the candidates
- ATS (version 0.2.8)
- A functional language with a very advanced type system, which includes dependent types (e.g. "a String of length n") and linear types (e.g. an obligation to close a file descriptor after use). An interesting feature of ATS is that its run-time types are identical to C's (e.g. an ATS string is a null-terminated array of chars), allowing interoperability with C without wrapping.
1
|
|
- C# (mcs 3.0.7.0)
- Microsoft's Java alternative. Much of the Windows version of 0install is written in C# (it currently uses IPC to talk to the Python solver process). It compiles to .NET bytecode, which can run on any platform. On non-Windows platforms, Mono can run .NET code.
1 2 3 4 5 |
|
- Go (1.1)
- Google's C replacement, with a focus on being light-weight and easy to use.
1 2 3 4 5 6 7 |
|
- Haskell (7.6.3)
-
A lazy, purely functional language (no function can have side-effects;
main
essentially returns a request to print hello world).
1
|
|
- OCaml (4.00.1)
- Another functional language. It can be interpreted, compiled to platform-independent bytecode, or compiled to native code.
1
|
|
- Python (2.7.5 and 3.3.2)
- A popular, easy to learn interpreted language.
1
|
|
- Rust (0.6)
- Mozilla's experimental new language, which aims to be a safe replacement for C. It supports manual memory management (as well as optional garbage collection), using linear types to ensure that everything is used safely and freed correctly.
1 2 3 |
|
Test case
To get a feel for each language, we implemented a trivial piece of 0install in each one. Here's the current (complete) Python version:
1 2 3 4 5 |
|
In case you're wondering what this is for: this executable is used when a program A depends on another, B. A has this launcher script in its $PATH under the appropriate name. When A runs the launcher, the launcher runs the correct version of B with the appropriate interpreter and arguments. For example, A might run foo --process
and the launcher might invoke /usr/bin/python2.7 /var/cache/0install.net/.../foo.py --process
.
To avoid creating one launcher for every possible set of versions, the exact strings to use are placed in an environment variable in program A's environment when it is launched, and foo
is just a symlink to the launcher. So, this program (the launcher):
- Finds out the name of the symlink used to invoke it.
- Gets the details of how to invoke program B from the environment (this is a JSON string list).
- Invokes the target with those arguments, plus any extra arguments passed to the launcher.
Note: I didn't make any particular effort to write the test code carefully. If it seemed to work, I went with it. In real code I would obviously try to be more careful, but since real code would be bigger, I'd also make more mistakes. I want to see how well each language prevents me from making mistakes.
Speed and size
0install doesn't require much CPU time, but it does need to start quickly (and this particular bit especially so). The table below shows how many times per second each version of the launcher was able to run a trivial C program. "Overhead" is the amount of time each run took over running the binary directly without a launcher. "Size" is the size of the binary being executed, excluding any shared libraries or runtime components.
Time (ms) | Overhead (ms) | Speed (runs/s) | Size (KB) | Language |
---|---|---|---|---|
1.74 | + 0.63 | 574 | 72.83 | ATS |
3.02 | + 1.90 | 332 | 1210.88 | OCaml (native) |
3.38 | + 2.26 | 296 | 1323.30 | Haskell |
8.99 | + 7.87 | 111 | 1907.68 | Go |
9.33 | + 8.22 | 107 | 326.79 | OCaml (bytecode) |
10.20 | + 9.09 | 98 | 130.58 | Rust |
51.88 | + 50.76 | 19 | 0.18 | Python3 |
82.98 | + 81.86 | 12 | 0.18 | Python2 |
83.37 | + 82.26 | 12 | 3.50 | C# (Mono) |
- ATS (5)
- ATS is the clear winner here. It's significantly faster than its closest rival, and at a fraction of the size. Note that the only smaller executables (Python and C#) depend on huge external runtimes. The ATS binary depends only on libjson.so, a 39 KB C library that, on my system at least, was already installed.
- Haskell and OCaml (4)
- These two garbage-collected functional languages both put in impressive performances.
- Rust (3)
- I was surprised and disappointed by how badly Rust did here. With its low-level focus and manual memory management, I was expecting it to get close to ATS. Maybe things will improve when it's more mature.
- Go (3)
- A very disappointing performance from Go too.
- Python (2)
- Python executables are small, but very slow. Note that Python 2 is usually faster than Python 3, but after a recent package update, Python 2 has suddenly become slower on my system. Python 2 was previously getting 27 runs per second.
- C# (1)
- Fiddling around with ahead-of-time compilation and static linking didn't seem to help here.
Except for C#, I didn't make any particular attempt to make the binaries smaller or faster, but just went with the compilers' defaults.
Some of these languages depend on shared runtimes or libraries, which may or may not already be installed. For each one, I looked at how much extra software I had to install to make the binary run on a Debian 7 clean install (just "SSH server" and "Standard system utilities" selected during install). This is also a good test for binary compatibility problems (see next section).
"Dependencies" is the "additional disk space used" reported by apt-get install ... --no-install-recommends
.
Total (KB) | Binary size (KB) | Dependencies (KB) | Language |
---|---|---|---|
0.18 | 0.18 | 0 | Python |
156.83 | 72.83 | 84 | ATS |
1210.88 | 1210.88 | 0 | OCaml (native) |
1323.30 | 1323.30 | 0 | Haskell |
1907.68 | 1907.68 | 0 | Go |
13242.58 | 130.58 | 13112 | Rust |
13335.79 | 326.79 | 13009 | OCaml (bytecode) |
31133.50 | 3.50 | 31130 | C# (Mono) |
However, this is slightly unfair because we'd need to use many other features for a full version of 0install. Languages with large standard libraries (e.g. Python and .NET) won't need much extra stuff.
- Python (5)
-
Like most Linux systems (and OS X), Python is installed by default.
- ATS (5)
-
ATS would be the smallest by far, if Python wasn't pre-installed.
apt-get install libjson0 --no-install-recommends
- OCaml (4)
-
Having both native and bytecode options is convenient: we can use bytecode for programs that aren't speed critical, and native code for embedded situations. The native version has no dependencies. The bytecode version requires:
apt-get install ocaml-base libyojson-ocaml --no-install-recommends
- Haskell (3)
-
Only slightly larger than OCaml, but no bytecode option.
- Go (3)
-
Go doesn't support dynamic linking, so there are no dependencies.
- Rust (2)
-
Rust has a surprisingly large runtime, considering that its standard library is quite limited.
- C# (1)
-
Debian's
libnewtonsoft-json4.5-cil
was incompatible with the one I'd used, so I used my copy ofNewtonsoft.Json.dll
.apt-get install binfmt-support mono-runtime libmono-system-windows-forms4.0-cil --no-install-recommends
Binary compatibility
Several of these programs, compiled on my Arch Linux system, failed to run on Debian because they'd picked up a dependency on GLIBC 2.14's memcpy (glibc uses symbol versioning, so that when it changes in an incompatible way, your current binaries continue working for a bit, before breaking mysteriously next time you recompile).
For the affected programs, I recompiled them on Debian. This isn't a huge problem, because we can just compile binaries on the oldest system we want to support and they will still work on newer systems.
- Python (5)
-
Worked fine.
/usr/bin/python
was Python 2 rather than Python 3 (as on Arch), which can be a hassle, but 0install is written to run on either. - C# (5)
-
.NET is a nice portable bytecode.
However, the binary wouldn't run when executed directly until I installed
binfmt-support
. Debian'slibnewtonsoft-json4.5-cil
was incompatible with the one I'd used, so I bundled my copy ofNewtonsoft.Json.dll
. - ATS (4)
- Had the GLIBC 2.14 problem. However, I didn't need to recompile on Debian, as ATS allowed me to specify the desired symbol version with a few lines of embedded C. Then the binary compiled on Arch also worked on Debian.
- OCaml (4)
-
The OCaml native binary failed to work due to the GLIBC 2.14 dependency and had to be recompiled.
The OCaml bytecode version failed with
Fatal error: unknown C primitive 'caml_array_blit'
and had to be recompiled. The resulting recompiled binaries worked on my modern Arch Linux system. - Go (3)
- Worked, but only because it doesn't support dynamic linking. That's not very useful if we need to upgrade a library. It's hard to give a score here. On the one hand, it did work perfectly (and so should get a 5). On the other hand, any language can get binary compatibility by statically linking everything; that's not really what we're interested in.
- Haskell (2)
-
Also failed with the GLIBC 2.14 problem. I rebuilt the binary on Debian 7, but the new binary then
didn't work on my newer Arch system:
libffi.so.5: cannot open shared object file: No such file or directory
. - Rust (2)
-
Failed with
libcore-c3ca5d77d81b46c1-0.6.so: cannot open shared object file
. Rust is not available on Debian 7. I compiled Rust from source to get the libcore library (gotthe compiler hit an unexpected failure path. this is a bug
, but I seemed to have a working compiler at the end anyway, in thestage1
directory). I then hit same the GLIBC problem with my test binary. I used the new rust compiler to rebuild the binary on Debian.
To clarify what we want to do here: Currently, to make a new release of 0install I publish a single tarball containing the Python code. Tools which depend on this library (e.g. 0compile) start using the new version automatically (they don't need to be rebuilt). Also, if a library 0install depends on (e.g. GTK or glibc) gets updated, I don't need to make a new release there either.
Safety
For me, a "safe" language is one which stops and reports a problem when something unexpected occurs at runtime. An unsafe one carries on, using incorrect data. Unsafe behaviour often causes security problems and data loss. For example, many programs, including 0install, update files atomically by writing out the new data to a new file and then renaming it over the original on success. If the function says it successfully saved the data when it didn't (e.g. because the disk was full) data loss will occur.
As a basic test of each language's approach to safety, I took the "Hello World" example program from the language's own tutorial, compiled it, and ran it like this:
$ ./hello 1< /dev/null; echo Exit status: $?
This runs it with a read-only stdout, so the program will fail to output its message. A safe language will print an error to stderr and return a non-zero exit status to indicate failure. An unsafe language will print nothing and return 0 (success). If you're not sure why this is important, imagine the command is dump-database > backup
and the filesystem is full.
My theory is, if the language designers can't write hello world safely, what hope do the rest of us have?
- Rust (5)
-
Amazingly, Rust was the only language to pass this test!
- ATS, C#, Go, Haskell, OCaml, Python (1)
-
Rubbish.
Update: OCaml would have passed if they'd used
print_endline
, but the tutorial usedprint_string
, which doesn't abort.
Note: Isn't it a bit silly to generalise from this one data-point to the behaviour of the whole language? Yes. But as a starting off point for discussion, it's working quite well. e.g. the OCaml response was surprise that it didn't work, whereas Go users regard this behaviour as normal and expected.
Next, what does each of the sample programs do if the environment variable isn't set? The program should abort with an error message when it tries to read the environment variable.
- Python, OCaml, Haskell (5)
- All abort correctly with an exception. No special code needed.
- Rust, ATS (5)
- The compiler forces me to handle the case of the variable not being set. Good.
- C# (3)
-
getenv
returns null and continues. The program aborts later as the JSON parser can't parse null. - Go (1)
-
Getenv
returns the empty string and continues. Then Go somehow manages to parse the empty string as an empty JSON list and still continues. Then it tries to interpret the first of the user arguments to the program as the path of the program to run and execs that instead! Utter failure.
Finally, does the language allow unsafe memory access (e.g. reading from memory after it has been freed)? A safe language will not allow this unless the programmer explicitly requests an "unsafe" mode.
- C#, Python, OCaml, Haskell (5)
-
These languages generally don't provide any unsafe memory access (or if they do, it's an obscure feature you wouldn't use in normal code).
- Go (5)
-
Go has an "unsafe" package for unsafe operations.
- Rust (4)
-
Rust is safe unless you use
unsafe {}
blocks. Unfortunately, I did have to use a couple to implement the sample code, because Rust's standard library didn't provide anexecv
call and I had to make my own wrapper. - ATS (3)
-
In theory, ATS's type system prevents unsafe access. In reality, the standard library (which defines the types of functions in various C libraries) contains many unsafe type declarations. Often, you have something like a
foo_get_name: Foo -> String
function which returns a pointer into aFoo
structure. The simplest way to define this in ATS doesn't indicate that the result is only valid untilFoo
goes away. With a little extra effort (usingminus
), you can prevent that, but it's unreasonably hard to say that the string will also become invalid ifFoo
is mutated. Rust, on the other hand, makes these things very easy to express.Another problem with ATS is that types may not be correct if exceptions are used. For example, the sequence
m = alloc(); use(m); free(m)
compiles (because it thinks thatm
is always freed), but ifuse
throws an exception thenm
will not be freed. This can be avoided by never catching exceptions.
Language | Hello | Missing env | Memory safety |
---|---|---|---|
Rust | 5 | 5 | 4 |
Haskell | 1 | 5 | 5 |
OCaml | 1 | 5 | 5 |
Python | 1 | 5 | 5 |
ATS | 1 | 5 | 3 |
C# | 1 | 3 | 5 |
Go | 1 | 1 | 5 |
Finally, I should note that, while Python generally has safe defaults, the HTTPS handling is an exception to this (it doesn't validate the certificates), and the documentation for the XML modules in the standard library notes that "The XML modules are not secure against erroneous or maliciously constructed data.". However, I only know about these because I'm very familiar with Python - the other languages may have similar issues - so I'm not going to count it here.
Diagnostics
When a run-time exception occurs, how easily can the user diagnose the problem or search for a solution on-line? If they write to us, how helpful will the output be to us?
My test-case here is, again, the missing environment variable (if this happened, it would indicate a bug in some other part of 0install):
- Python (5)
-
Python displays a stack-trace showing exactly what it was doing and gives the name of the variable it was looking for. The error includes the path to the Python code, so any programmer can easily open it and do further debugging. If the user posts the error to the mailing list, we would immediately know what was wrong. And I didn't have to write any code to make it do this. Perfect.
Traceback (most recent call last): File "./runenv2.py", line 5, in <module> args = json.loads(os.environ["0install-runenv-" + envname]) File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: '0install-runenv-runenv2.py'
- Haskell (3)
-
A reasonably clear error, but no clue about where in the code it's coming from:
Runenv: 0install-runenv-Runenv: getEnv: does not exist (no environment variable)
- Rust (3)
-
The compiler forced me to add some code to handle the error. It was easy to include the environment variable name in the error, so I did. This results in a reasonably clear message, but the location is in a generic part of the standard library and not helpful:
rust: task failed at 'Environment variable '0install-runenv-runenv' not set', /build/src/rust-0.6/src/libcore/option.rs:300
Update: cmrx64 notes that setting
RUST_LOG=::rt::backtrace
gives a stack-trace. Currently, this isn't very useful because it doesn't include line numbers, but it looks like this is going to improve. - ATS (2)
-
Like Rust, ATS forced me to handle the error. The simplest solution here was to throw an exception, which led to the somewhat-unhelpful message:
exit(ATS): uncaught exception: _2home_2tal_2Projects_2ats_2runenv_2edats__Missing_EnvironmentVar(1025)
- C# (1)
-
Fails to detect the problem and gives:
System.ArgumentNullException
at a later point in the code. You do get a stack-trace, though:Unhandled Exception: System.ArgumentNullException: Argument cannot be null. Parameter name: value at Newtonsoft.Json.JsonConvert.DeserializeObject (System.String value, System.Type type, Newtonsoft.Json.JsonSerializerSettings settings) [0x00000] in <filename unknown>:0 at Newtonsoft.Json.JsonConvert.DeserializeObject[String[]] (System.String value, Newtonsoft.Json.JsonSerializerSettings settings) [0x00000] in <filename unknown>:0 at Newtonsoft.Json.JsonConvert.DeserializeObject[String[]] (System.String value) [0x00000] in <filename unknown>:0 at Runenv.Main (System.String[] userArgs) [0x00000] in <filename unknown>:0 [ERROR] FATAL UNHANDLED EXCEPTION: System.ArgumentNullException: Argument cannot be null. Parameter name: value at Newtonsoft.Json.JsonConvert.DeserializeObject (System.String value, System.Type type, Newtonsoft.Json.JsonSerializerSettings settings) [0x00000] in <filename unknown>:0 at Newtonsoft.Json.JsonConvert.DeserializeObject[String[]] (System.String value, Newtonsoft.Json.JsonSerializerSettings settings) [0x00000] in <filename unknown>:0 at Newtonsoft.Json.JsonConvert.DeserializeObject[String[]] (System.String value) [0x00000] in <filename unknown>:0 at Runenv.Main (System.String[] userArgs) [0x00000] in <filename unknown>:0 of range`).
- OCaml (1)
-
Gives a useless generic error. We'd have no clue where the problem was from this:
Fatal error: exception Not_found
Update: ygrek says it is possible to get the location, but you need to compile with
-g
and run withOCAMLRUNPARAM=b
. - Go (1)
-
Fails to detect the problem at all and gives a useless error from another point in the code
panic: runtime error: index out of range goroutine 1 [running]: main.main() /home/tal/Projects/go/runenv.go:16 +0x239 goroutine 2 [runnable]
Ease of writing
This is even more subjective than the other areas but here goes:
- Python (5)
-
The code is clear, short and easy to understand even if you don't know Python (but do know POSIX):
runenv.py 1 2 3 4
import os, sys, json envname = os.path.basename(sys.argv[0]) args = json.loads(os.environ["0install-runenv-" + envname]) os.execv(args[0], args + sys.argv[1:])
- C# (4)
-
A little verbose, but quite readable (update: Bastian Eicher suggests some improvements to this code):
runenv.cs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
using System; using Mono.Unix.Native; using Newtonsoft.Json; using System.Collections; using System.Windows.Forms; using System.IO; public class Runenv { public static void Main(String[] userArgs) { String basename = Path.GetFileName(Application.ExecutablePath); String json = Stdlib.getenv("0install-runenv-" + basename); String[] progArgs = JsonConvert.DeserializeObject<String[]>(json); String[] argv = new String[progArgs.Length + userArgs.Length]; progArgs.CopyTo(argv, 0); userArgs.CopyTo(argv, progArgs.Length); Syscall.execv(argv[0], argv); } }
- OCaml (4)
-
This code was contributed by "ygrek" (I wrote my own version first, which was longer but also fairly clear):
runenv.ml 1 2 3 4 5 6 7 8 9
let () = match Array.to_list Sys.argv with | [] -> assert false | arg0::args -> let var = "0install-runenv-" ^ Filename.basename arg0 in let s = Sys.getenv var in let open Yojson.Basic in let envargs = Util.convert_each Util.to_string (from_string s) in Unix.execv (List.hd envargs) (Array.of_list (envargs @ args))
- Haskell (4)
-
Tim Cuthbertson contributed this Haskell version:
runenv.hs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
module Main where import System.Environment (getArgs, getEnv, getProgName) import System.Posix.Process (executeFile) import Text.JSON main = do envName <- getProgName argv <- getArgs jsonContents <- getEnv $ "0install-runenv-" ++ envName let jsv = parseJSON jsonContents let program:extraArgs = parseArr jsv executeFile program False (extraArgs ++ argv) Nothing where parseJSON str = decode str :: Result [JSString] parseArr :: Result [JSString] -> [String] parseArr (Ok jss) = map fromJSString jss
- Go (3)
-
The Go version was pretty easy to write, except that the error handling was so bad that when I got things wrong, it was hard to understand which bit was actually failing (hence why it explicitly handles the error from
Exec
; this was to help me debug it when it failed silently the first time, despite my plan to add error handling only if the compiler told me to):runenv.go 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
package main; import "path" import "encoding/json" import "os" import "syscall" func main() { json_data := []byte(os.Getenv("0install-runenv-" + path.Base(os.Args[0]))) var argv []string json.Unmarshal(json_data, &argv) argv = append(argv, os.Args[1:]...) err := syscall.Exec(argv[0], argv, syscall.Environ()) if err != nil { panic(err) } }
- Rust (3)
-
This is only part of the code; I also wrote some support code to turn a list of JSON strings into a list of Rust strings, and an implementation of
execv
. Apart from that it was pretty easy:runenv.rs 1 2 3 4 5 6 7 8 9 10 11 12 13
fn main() { let our_args = os::args(); let prog :path::PosixPath = path::Path(our_args[0]); let basename :&str = prog.filename().expect(fmt!("Not a file '%?'", prog)); let var :~str = ~"0install-runenv-" + basename; let json_str = os::getenv(var).expect(fmt!("Environment variable '%s' not set", var)); let j = json::from_str(json_str).unwrap(); let mut prog_args: ~[~str] = json_list_to_str_vector(&j); prog_args += our_args.tail(); execv(prog_args); }
- ATS (1)
-
It took me several days to learn enough to be able to write this, and the code ended up several hundred lines long. However, much of this was support code (e.g. code for interfacing to libjson and safer handing of execv) and it looks like this won't be needed in ATS 2 (which is not yet ready for use).
To help understand it a little, you need to know that there are both dynamic (runtime) variables, (e.g.
target_argv
, a C pointer to an array) and static (compile-time) "proofs" (e.g.must_free_argv
). These are often written asproof1, proof2, ... | dyn1, dyn2, ...
.prval
lines are only used at compile-time for checking; they generate no code.The biggest problem programming in ATS however is that the compiler is very unforgiving. Most errors result in a generic
syntax error
. Putting two operators too close together often causes them to be treated as a single third operator. Errors about constraint violations are printed using the compiler's hard-to-read internal representation, etc.Here's a small sample (the main function):
runenv.dats 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[...] // The main function implement main_argc_argv(argc, argv): void = let // Get JSON string from the environment val prog_args_json = get_json(view@ argv | &argv) // Skip argv[0] val n_user_args = argc - 1 prval (pr_self, pr_user_args) = array_v_uncons(view@ argv) val user_args: ptr = (&argv) + sizeof<string> // Parse the JSON val json_root = checked_json_tokener_parse(string1_of_strptr prog_args_json) val (borrowed_json | json_array) = json_object_get_array(json_root) val n_prog_args = array_list_get_length(json_array) val () = assert(n_prog_args > 0) // Allocate the target_argv vector to be passed to execv // It will contain prog_args + user_args (plus a null, added by string_array_alloc) val argv_len = n_prog_args + n_user_args val (must_free_argv, str_array_v | target_argv) = string_array_alloc (size1_of_int1 argv_len) // Populate target_argv from json_array and user_args val () = initialise_array(str_array_v, pr_user_args | target_argv, size1_of_int1 n_prog_args, json_array, size1_of_int1 n_user_args, user_args) // Extract the program name from target_argv // This is just a long way of writing "target_prog = target_argv[0]" prval (argv_initialised, borrowed_string_array) = string_array_take_contents(str_array_v) prval (head, tail) = array_v_uncons(argv_initialised) val target_prog: string = ptr_get_t<string>(head | target_argv) prval argv_initialised = array_v_cons(head, tail) prval () = str_array_v := string_array_return_contents(argv_initialised, borrowed_string_array) // Exec the target program if possible val error_code = execv(str_array_v | target_prog, target_argv) val () = print("Failed to execv\n") // Cleanup (only reached on failure) prval () = minus_addback(borrowed_json, json_array | json_root) // Don't need json_array any longer val () = json_object_put(json_root) // Free JSON object val () = string_array_free(must_free_argv, str_array_v | target_argv) // Free target_argv prval () = view@ argv := array_v_cons(pr_self, pr_user_args) // Rejoin argv[0] with argv[1:] in exit(1) end
Shared libraries
The 0install package contains both executables and library classes for use by other tools (0compile, 0test, 0repo, etc). It must be possible to make a bug-fix release of the main library without having to make a new release of every tool that depends on it.
- ATS, C#, Python (5)
-
I didn't bother to test these, as it's obvious that they support shared libraries with no problems.
- Rust (5)
-
Works, but the reference compiled into the main binary includes a hash (e.g. `libmylib-68a2c114141ca-1.4.so`). The docs say that "The alphanumerics are a hash representing the crate metadata." I added an `author` field to the metadata, and sure enough the new library could no longer be used by the existing binary. It's not very clear whether there are other things that could break the hash.
Update: see this explanation from the Rust developers. The hash is just to make the library name unique (in case someone else writes a library with the same name) and should never change. i.e. we should specify the "author" as "0install.net". In addition, symbols get a hash which includes their type signature (so incompatible changes become link-time errors) and the library hash (so you don't get symbol name conflicts between libraries). So, I've now given Rust a 5 here (was 2).
- Haskell (2)
-
Shared libraries do work, but are tied to the version of ghc used to compile them. e.g. Compiling with
-dynamic
gets a dependency onlibHSjson-0.7-ghc7.6.3.so
. A Haskell library author would have to provide an enormous number of versions of their library to cover every GHC version on every platform. - OCaml (1)
-
There's also a Dynlink module for loading shared code manually at runtime. According to a StackOverflow answer: "Very often when upgrading the Ocaml compiler (e.g. from 3.12 to future 3.13), you previous `*.cmo` or `*.cma` files won't be able to work without recompilation." I haven't yet managed to make shared libraries work.
Update: The situation is essentially the same as with Haskell; shared libraries work but if a dependency changes then everything that uses it must be recompiled. Dynlink does work, but it's really intended for plugins (where the main executable provides a fixed API for the dynamically loaded plugin).
- Go (1)
-
Doesn't support shared libraries at all.
Static types
All languages here except for Python provide at least basic static type checking. After Python, Go has the most primitive system, without even user-defined polymorphic types. However, even that would save us from a number of problems common with Python code (particularly bugs in rarely tested error paths or breakage caused by library APIs changing without us noticing).
Prevention of null pointer errors
- ATS, Rust, Haskell, OCaml
- These languages distinguish in their type systems between objects and null. Therefore, if an object has type Foo, then it really is a Foo, not null, and the program can't crash with a NullPointerException or equivalent.
- C#, Go, Python
- These languages cannot distinguish between values and null at compile time.
Dependent types
ATS is the only language here with dependent types. You can do a lot of cool things with these.
For example, the assert(n_prog_args > 0)
check in the ATS code above isn't there by
accident; the ATS compiler required me to prove that the list generated by parsing the JSON wouldn't be empty,
since I needed to take the first argument as the program name.
Note that the Go version failed with an index out of range
error (see above); that cannot happen in the ATS version (it will still fail, since I used an
assert
, but it fails with a sensible error message at the correct point in the code). Using an assert
or an if
puts a
runtime check into the program. In other cases, you may be able to insert a proof that the array won't be
empty.
Managing resources (linear types)
ATS and Rust support linear types. This allows them to ensure that, for example, if a file is opened then it will also be closed again promptly, and not used after that. ATS's support is more flexible, but Rust's is much easier to use.
Haskell, OCaml, C#, Go and Python do not detect (at compile time) attempts to read from a closed file, or forgetting to close a file.
Note that all languages allow you to define a function that opens the file, calls the function with the new handle, and then closes the handle, which works well in many cases. However, this doesn't prevent the function from saving a reference to the file handle and trying to reuse it later, and doesn't allow storing handles in other objects, etc. It also doesn't work when the nesting isn't fixed (e.g. opening and reading from several TCP streams in parallel, closing each one when done).
Bounds on privilege
When parsing XML it is useful to know, without examining the XML parser in detail, that it cannot load files from the local filesystem (some XML parsers allow XML documents to do this via their DTDs, which we want to prevent for security reasons). When unpacking an archive we'd like to know, without auditing the unzip code, that it won't write anywhere outside of the target directory. And so on.
- Haskell (5)
-
Haskell seems ideal for this, since its functions are side-effect free, though I'm not sure if it prevents libraries from using unsafe functions if they want to.
Update: Tim Cuthbertson writes: "There is a Safe Haskell feature in GHC, which does guarantee that unsafe features are not used (which can applied when compiling a given module / package). So that's good - but I haven't used it, so I can't vouch for its practicality."
- OCaml (5)
-
OCaml has a tool called Emily which enforces object-capability rules on OCaml code. I haven't tested it yet, though.
- C# (5)
-
Looks pretty good. Bastian Eicher says: "Adding the attribute
[SecurityPermission(SecurityAction.Deny, Flags = SecurityPermissionFlag.UnmanagedCode)]
to a method prevents it from calling into unmanaged code directly or indirectly. Only data structures within the application itself can be touched and no IO is allowed. Native methods which have been deemed to be safe (e.g. retrieve the current system time) have the[SuppressUnmanagedCodeSecurityAttribute]
attribute to bypasses this restriction. Disclaimer: I have not used these features in my own code so far." - ATS (2)
-
ATS functions can be annotated as pure, but this is greatly limits what can be done (for example, a pure function can't even throw an exception).
- Go, Python, Rust (1)
-
I'm not aware of any particular security features in these languages.
Mutability
Immutable objects (objects which you can rely on not to change) make programs safer and easier to reason about, but can also be less efficient.
- Rust (5)
-
Rust's linear types mean that the compiler knows whether you hold the only pointer to something. This means that you can create an object, mutate it (e.g. while building it), then pass it as an immutable object to another function. Once the function has finished with it, you can mutate it again. Efficient and safe - perfect.
- C#, OCaml, Go (4)
-
Struct/object fields can be declared as mutable or immutable.
Update: Blax points in the comments that Go actually provides control of whether a field is exported or not. A field is then made "immutable" by not providing a setter for it, only a getter.
- ATS (4)
-
ATS generally doesn't distinguish between mutable and immutable pointers (e.g. in the standard library), although the type system is flexible enough that you could do this for custom types. Values can be declared as
val
(immutable) orvar
(variable). - Python (3)
-
Everything is always mutable, which can lead to bugs (e.g. mutating a list without realising that it's shared).
- Haskell (2)
-
Everything is always immutable, which has benefits but can be very annoying and inefficient when you need mutability. Since 0install is written in an imperative style, a translation into Haskell would likely be difficult.
C interoperability
Everything we might want to interact with will provide at least a C API. How easy is it to use these?
- ATS (5)
-
ATS's runtime data structures are identical to C's. All you have to do is declare the C function with an ATS type signature (as the C definitions are too vague to be useful). ATS produces C code as output, and you can even embed C functions in your ATS source code and it will pass them through to the C compiler directly. As noted above, ATS does make it difficult (though not impossible) to express some common constraints (e.g. that a returned pointer will remain valid until the input structure is mutated).
- C# (4)
-
Interop with Native Libraries shows how to call C functions from C#.
- Go (4)
-
cgo makes it easy to call C from Go.
- Haskell (4)
-
C libraries can be wrapped using Haskell's FFI.
- Rust (4)
-
C functions can be declared with Rust raw pointers, but can only be called from
unsafe
code. You therefore need to write wrappers for them. An annoyance here is that Rust's types are not the same as C's, so e.g. every string has to be copied whenever you invoke a C function (Rust strings don't always have null terminators) and get a result back (Rust strings have a length header). Also, you can't use Rust's linear pointer types with C functions, because Rust assumes that there is a header block on such types. This makes interfacing with C a little less efficient than it could be (this is no worse than e.g. C# or Haskell; I just feel Rust could do better). - Python (3)
-
Due to its huge popularity, most libraries also provide Python bindings. There's also the
ctypes
module in the standard library, but generally people seem to write Python binding code in C when they need to interact with C libraries. - OCaml(3)
-
Interfacing OCaml with C requires writing C code.
Update: ygrek notes that there is a new OCaml-ctypes project which allows bindings without writing C.
Asynchronous code
0install needs to be able to download from multiple sites in parallel and without blocking the UI.
- C# (5)
-
Provides the
Task<T>
type for pending results and theawait
keyword to wait for them. Untested. - Haskell (5)
-
The
Async
type is used to represent a potential future result. Untested. - Go (5)
- Go's "goroutines" make it very easy to spawn an asynchronous task and its excellent channels make it easy for goroutines to communicate safely. However, Go does not prevent unsafe communication between threads (e.g. via shared variables).
- Rust (5)
- Like Go, Rust provides easy support for spawning light-weight threads and channels for communication. Rust's type system prevents unsafe concurrent access (every mutable object is owned by a single thread).
- OCaml (5)
- The LWT package provides support. Untested, but looks good.
- Python (4)
- Python's generators make it easy to implement co-routines for asynchronous operations. Many such libraries have been implemented, but it looks like Tulip will soon become the official solution in the standard library.
- ATS (1)
- No special features, just raw C threading.
Summary
Language | Rust | OCaml | Python | Haskell | ATS | C# | Go |
---|---|---|---|---|---|---|---|
Speed | 3 | 4 | 2 | 4 | 5 | 1 | 3 |
Dependencies | 2 | 4 | 5 | 3 | 5 | 1 | 3 |
Bin. compatibility | 2 | 4 | 5 | 2 | 4 | 5 | 3 |
Bad stdout | 5 | 1 | 1 | 1 | 1 | 1 | 1 |
Missing env | 5 | 5 | 5 | 5 | 5 | 3 | 1 |
Memory safety | 4 | 5 | 5 | 5 | 3 | 5 | 5 |
Diagnostics | 3 | 1 | 5 | 3 | 2 | 1 | 1 |
Ease of coding | 3 | 4 | 5 | 4 | 1 | 4 | 3 |
Shared libraries | 5 | 1 | 5 | 2 | 5 | 5 | 1 |
Static types | 5 | 4 | 1 | 4 | 5 | 3 | 2 |
Privilege bounds | 1 | 5 | 1 | 5 | 2 | 5 | 1 |
Mutability | 5 | 4 | 3 | 2 | 4 | 4 | 4 |
C interoperability | 4 | 3 | 3 | 4 | 5 | 4 | 4 |
Asynchronous calls | 5 | 5 | 4 | 5 | 1 | 5 | 5 |
Total | 52 | 50 | 50 | 49 | 48 | 47 | 37 |
So what does this tell us? There's no clear winner here (although there is a clear loser). Although the various languages differ widely in the individual aspects, overall they tend to balance out.
Update: OCaml and Python were originally joint first. However, now that Rust's library hashes have been explained, it has moved into the lead. Of course, just summing up the scores doesn't make much sense anyway; it's just a convenient way to sort them.
What would happen if we wrote 0install in ... ?
- Rust
- The language is still changing rapidly at the moment. I suspect we'd hit quite a few problems trying to use it in production. It's looking very promising though.
- OCaml
- 0install would become faster and possibly more reliable, but runtime errors would be harder to debug. We might have problems publishing updates to shared libraries. A perfectly reasonable option, though.
- Python
- Everything would stay as it is. Which, actually, is not bad at all. Maybe we could investigate other ways to improve speed and type safety without leaving Python? That's likely to be less work than a rewrite and much less risky. Cython? ShedSkin? PyPy? RPython?
- Haskell
- We'd probably have issues with binary compatibility and shared libraries, but the code might become more reliable. Converting the existing code to a purely functional style would likely be very difficult though, and there's a risk that some things would turn out to have no obvious equivalent.
- ATS
- Everything would be incredibly fast, but getting new contributors would be very difficult due to the learning curve. There's a risk of crashes as the library is not entirely memory safe, and there are likely to be changes ahead to the language. Probably writing the whole thing in ATS would be too much work for anyone.
- C#
- Performance would improve slightly on Windows, but things would get worse for Linux, Unix and OS X users due to the extra dependencies. Probably we could get some of the same improvements on Windows using IronPython.
- Go
- Go is worse than OCaml in just about every respect, so I can't see any reason to choose it if we wanted to do a rewrite.
Please post corrections and suggestions below (or on the mailing list) - thanks!
Updates for other languages:
- Vala
- Anders F Björklund's Vala code
- Haxe (not working)
- Tim Cuthbertson tries Haxe