Proving liveness with TLA - Thomas Leonard's blog

The TLA Toolbox now has support for proving liveness properties (i.e. that something will eventually happen). I try it out on the Xen vchan protocol.

Working on a liveness proof with the TLA+ Toolbox.

Table of Contents

Background
A simple channel specification
- Specification in terms of actions
- Invariants
Temporal logic
Proving liveness (simple case)
Multi-step liveness
The real protocol
Work-arounds for bugs
Conclusions

( this post also appeared on Hacker News and Lobsters )

Background

The vchan protocol is used for efficient communication between Xen virtual machines. In 2018, I made a TLA+ specification of vchan: I created a specification of the protocol from the C code, used the model checker (TLC) to test that the protocol worked on small models, and wrote a machine-verified proof of Integrity (that the data received always matched what was sent). I also outlined a proof of Availability (that data sent will eventually arrive, rather than the system deadlocking or going around in circles). But:

Disappointingly, we can't actually prove Availability using TLAPS, because currently it understands very little temporal logic

However, newer versions of TLAPS (the TLA Proof System) have added proper support for temporal logic, so I decided to take another look. In this post, I'll start with a simple example of proving a liveness property, and then look at the real protocol. If you're not familiar with TLA, you might want to read the earlier post first, though I'll briefly introduce concepts as we meet them here.

A simple channel specification

We'll start with a simple model of a one-way channel. There is a sender and a receiver, and they have access to a shared buffer of size BufferSize (which is a non-zero natural number):

CONSTANT BufferSize 
ASSUME BufferSizeType == BufferSize \in Nat \ {0}

The model just keeps track of the total number of bytes sent and received (not the actual data):

VARIABLES Sent, Got
vars == << Sent, Got >>     \* For referring to both variables together

The amount of data currently in the buffer (BufferUsed) is the difference between these counters, and the free space (BufferFree) is the difference between that and the total buffer size:

BufferUsed == Sent - Got
BufferFree == BufferSize - BufferUsed

The property we are interested in (Liveness) is that data sent eventually arrives:

Liveness ==
  \A n \in Nat :    \* For all natural numbers n:
    Sent = n ~>     \* if we have sent exactly n bytes, then eventually
      Got >= n      \* we'll have received at least that many.

Specification in terms of actions

Liveness is the property we want, but we also need to say how we plan to achieve this. In TLA, this is done by describing the initial state and the actions that can be performed at each step.

Initially, no bytes have been sent or received:

Init ==
  /\ Sent = 0
  /\ Got = 0

An action only looks at a single atomic step of the algorithm, and relates the values of the variables at the start of the step (e.g. Sent) to their values at the end (written primed, e.g. Sent').

The Send action is true when the sender increases Sent (by some amount n, limited by the free space in the buffer):

Send ==
  \E n \in 1..BufferFree :
    /\ Sent' = Sent + n
    /\ UNCHANGED Got

The Recv action is true when the receiver reads all of the data currently in the buffer:

Recv ==
  /\ BufferUsed > 0             \* Buffer must be non-empty
  /\ Got' = Got + BufferUsed
  /\ UNCHANGED Sent

Every action of our system is either a Send or a Recv:

Next ==
  \/ Send
  \/ Recv

Finally, Spec describes the behaviours of the whole system in these terms:

Spec ==
  /\ Init              \* Init is true initially
  /\ [][Next]_vars     \* Each step is a Next step, or leaves vars unchanged
  /\ WF_vars(Recv)     \* The receiver must eventually read, if it can

Invariants

It's always useful to prove some basic invariants about the system (in particular, the types of the variables):

I ==
  /\ Sent \in Nat
  /\ Got \in Nat
  /\ Sent >= Got                    \* We can't receive more than was sent
  /\ BufferUsed \in 0..BufferSize   \* The buffer doesn't overflow

Before trying to prove anything, it's good to use the model checker (TLC) to test it first. You'll need to put some limits on the number of bytes sent and checked in the model, so it isn't a full proof, but it's likely to spot most problems. In this example, it passes easily.

The general idea for proving things in TLA is to get away from temporal logic as quickly as possible and do most of the proof with simple actions. The following pattern should work for proving any invariant:

THEOREM AlwaysI == Spec => []I
<1> Init => I BY BufferSizeType DEF Init, I
<1> I /\ [Next]_vars => I' BY NextPreservesI DEF vars, I
<1> QED BY PTL DEF Spec

This says that Spec implies that I is always true because:

I is true initially.
Next steps preserve I (and so does leaving vars unchanged).
Therefore Spec ensures I will always be true.

Note: The <1> at the start of each step is the indentation level, which for some reason TLAPS can't work out by itself.

The final QED BY PTL ("by Propositional Temporal Logic") is the only step that uses temporal logic. Everything else is just regular logic.

Now all we need to do is prove that the Next action preserves I, which is straight-forward:

LEMMA NextPreservesI ==
  ASSUME I, Next
  PROVE  I'
<1> CASE Send BY BufferSizeType DEF I, Send
<1> CASE Recv BY BufferSizeType DEF I, Recv
<1> QED BY DEF Next

So for proving invariants like I we can mostly ignore temporal logic. By for proving liveness, we'll need to understand more about it...

Temporal logic

Temporal logic is a type of modal logic. In a modal logic we imagine there are many worlds, and we are in one of them. Ordinary mathematical statements refer to our world, so e.g. Sent = 4 refers to the value of Sent in our world. However, by using the modal operators, you can refer to other worlds. For example, [](Sent = 4) means that Sent = 4 is true in all worlds "reachable" from the current one.

In TLA, each world corresponds to a moment in time, and current and future times are reachable. So []X means "X will always be true". <>X means "X will eventually be true". For example:

[](Sent = 4) means that Sent = 4 is true now and always will be.
<>(Sent = 4) means that Sent = 4 is either true now or will be true at some point in the future.

Modal operators nest, so you can say things like <>[](Sent = 4) (eventually Sent will equal 4 and will remain so from then onwards), or [](F => <>G) (F always leads to G, which can also be written as F ~> G).

Proving temporal claims with TLAPS

In the proof of Spec => []I above, only one step required temporal logic (BY PTL). Having few such steps is good because these steps can be tricky.

The main problem is that TLA propositions are written in First Order Modal Logic, but TLAPS doesn't have any solvers that understand this! Instead, it has a number of solvers that understand regular non-modal logic, plus the PTL decision procedure that handles modal logic but not much else.

For example, consider this apparently simple claim:

[](Sent = 4) => Sent > 0

This says that if Sent is always equal to 4 then it is currently greater than 0. TLAPS can't prove this in a single step:

The regular solvers don't understand the [] bit.
The PTL procedure doesn't understand numbers.

Coalescing for Reasoning in First-Order Modal Logics explains in detail what's going on, but my rough understanding is that TLAPS replaces things a solver won't understand with new fresh variables.

For example, [](Sent = 4) => Sent > 0 is received by the solver as something like:

Blob1 => Sent > 0 (for a regular solver), or
[]Blob2 => Blob3 (for PTL)

Instead, you have to break it down into separate steps:

THEOREM [](Sent = 4) => Sent > 0
<1> Sent = 4 => Sent > 0 OBVIOUS
<1> QED BY PTL

In the second step, the PTL solver will receive something like this, which it can prove:

ASSUME Blob2 => Blob3
PROVE  []Blob2 => Blob3

Annoyingly, TLAPS doesn't have an easy way to show you what it replaced in this way. If it did, I think it would be much easier to learn how to use it. You can use tlapm spec.tla --debug tempfiles and look at the files generated for each solver backend, but they're quite hard to read.

Generalising

Things that can be proved without assumptions specific to the current time are always true. In the above example, PTL converted Blob2 => Blob3 from the regular solver to [](Blob2 => Blob3). This was only possible because the proof of Blob2 => Blob3 didn't depend on the current world/time.

For example:

THEOREM
  ASSUME Spec
  PROVE  [](Sent = 4) => [](Sent > 0)
<1> Sent = 4 => Sent > 0 OBVIOUS
<1> QED BY PTL (* Fails *)

We're trying to prove that if Sent is always 4 then it's always greater than 0. The problem is that we proved Sent = 4 => Sent > 0 in a context that assumed Spec, so TLAPS won't let us generalise it (even though it looks identical to what we proved before, and didn't actually use anything from Spec).

When PTL fails, the Interesting Obligations view shows e.g.

ASSUME ...
       Sent = 4 => Sent > 0 (* non-[] *),
       PTL 
PROVE  [](Sent = 4) => [](Sent > 0)

The (* non-[] *) indicates that this can't be generalised to all times; we only know it's true now.

Instead, we can do:

THEOREM
  Spec => ([](Sent = 4) => [](Sent > 0))
<1> Sent = 4 => Sent > 0 OBVIOUS
<1> SUFFICES ASSUME Spec
             PROVE  [](Sent = 4) => [](Sent > 0)
    OBVIOUS
<1> QED BY PTL DEF Spec

Doing the Sent = 4 => Sent > 0 step before introducing Spec as an assumption allowed it to be generalised. We could also have proved it as a separate lemma.

Here's an example where we make use of a non-[] assumption:

THEOREM
  ASSUME Spec
  PROVE  <>(Sent = 0)
<1> Sent = 0 BY DEF Spec, Init
<1> QED BY PTL

If Spec is true (we're at the start of the algorithm) then Sent = 0. We can't generalise to saying that Sent is always 0, but we can still use it to prove that Sent is (trivially) eventually zero. But having non-[] assumptions usually indicates that something has gone wrong.

Some assumptions are safe to have in the context however. You can assume constants (e.g NEW n \in Nat), and also things that will always be true (e.g. []I). This is fine:

THEOREM
  ASSUME NEW n \in Nat, []I
  PROVE  [](Sent = 4) => [](Sent > 0)
<1> Sent = 4 => Sent > 0 OBVIOUS
<1> QED BY PTL

Hiding definitions

Here's a simplified version of a problem I had:

LEMMA L1 ==
  ASSUME NEW n \in Nat
  PROVE  Sent >= n => [](Sent >= n)
PROOF OMITTED

THEOREM
  ASSUME NEW m \in Nat
  PROVE  Sent >= m + 1 => [](Sent >= m + 1)
BY L1  (* Fails! *)

As usual, the regular solvers can't handle the [] and PTL can't handle the numbers or the for-all. We could create a function (F) for the temporal formula and then hide its definition:

THEOREM
  ASSUME NEW m \in Nat
  PROVE  Sent >= m + 1 => [](Sent >= m + 1)
<1> DEFINE F(i) == Sent >= i => [](Sent >= i)
<1> \A n \in Nat : F(n) BY L1
<1> SUFFICES F(m + 1) OBVIOUS
<1> HIDE DEF F
<1> QED OBVIOUS

However, due to a bug in TLAPS, this is unreliable. Often a proof fails, then you rename things a bit and it passes, then you rename them back and it continues to pass! I reported this at https://github.com/tlaplus/tlapm/issues/247. Here's a screenshot of an extreme case, where TLAPS is failing to prove a conclusion that matches an assumption character for character!

TLAPS failing to prove something very obvious.

Until that's fixed, I found that the following pattern works well as a work-around:

L1_prop(n) == Sent >= n => [](Sent >= n)
LEMMA L1 ==
  ASSUME NEW n \in Nat
  PROVE  L1_prop(n)
<1> USE DEF L1_prop
<1> QED PROOF OMITTED

THEOREM
  ASSUME NEW m \in Nat
  PROVE  Sent >= m + 1 => [](Sent >= m + 1)
<1> L1_prop(m+1) BY L1
<1> QED BY DEF L1_prop

Even when this bug is fixed, I think it would be really nice if you could say e.g. BY L1(m+1) to explicitly use L1 with a particular value of n.

Proving liveness (simple case)

OK, let's use some temporal logic to prove Liveness:

Liveness ==
  \A n \in Nat :    \* For all natural numbers n:
    Sent = n ~>     \* if we have sent exactly n bytes, then eventually
      Got >= n      \* we'll have received at least that many.

One way to prove A ~> B (A always eventually leads to B) is from an action A => B' (A leads to B in one step), plus a few side conditions. We'll do that here because we conveniently have a Recv action that immediately gets us to our goal. To prove A ~> B a generally-useful pattern is:

THEOREM A ~> B
<1> A /\ Recv => B'
<1> A => ENABLED <<Recv>>_vars
<1> WF_vars(Recv)
<1> A /\ [Next]_vars => (A \/ B)'
<1> [][Next]_vars
<1> QED BY PTL

If we're at A and perform our useful action Recv then afterwards we'll be at B.
When we're at A we can perform the useful action.
If the useful action is continually possible then eventually it will happen.
If we're at A and perform any action Next, we'll either stay at A or get to B.
The system always performs Next steps.

Some examples of why these are necessary to prove Liveness:

Without 2: Recv requires at least 2 bytes in the buffer; sending a single byte fails.
Without 3: someone else has to perform Recv and we can't be sure they'll actually do it.
Without 4: the sender can Send some data and then Retract it before it is read.
Without 5: 4 wouldn't be useful.

(3) and (5) come directly from the definition of Spec. We need to write proofs for the others, but they're all non-temporal proofs and fairly simple. The only slightly tricky one is proving ENABLED:

LEMMA EnabledRecv ==
  ASSUME NEW n \in Nat, I,
         Sent >= n, ~(Got >= n)
  PROVE  ENABLED <<Recv>>_vars
BY AutoUSE, ExpandENABLED DEF Recv, I

The magic ExpandENABLED causes the definition of ENABLED to be expanded in the version given to the solver. This is only useful if the solver also has access to the definitions of all the actions used; AutoUSEprovides them automatically so you don't have to list them manually.

However, the solver has to work quite hard to prove this and I found it times out on more realistic examples. The following trick is more generally useful:

LEMMA EnabledRecv ==
  ASSUME NEW n \in Nat, I,
         Sent >= n, ~(Got >= n)
  PROVE  ENABLED <<Recv>>_vars
<1> <<Recv>>_vars <=> Recv BY DEF vars, Recv, I
<1> ENABLED <<Recv>>_vars <=> ENABLED Recv BY ENABLEDaxioms
<1> SUFFICES ENABLED Recv OBVIOUS
<1> BufferUsed > 0 BY DEF I
<1> QED BY AutoUSE, ExpandENABLED DEF Recv

Recv always modifies vars, so <<Recv>>_vars is the same as Recv.
Therefore, ENABLED <<Recv>>_vars is the same as ENABLED Recv.
Therefore, we just need to prove ENABLED Recv, which is easier for the solver.

The above uses the magic BY ENABLEDaxioms. This doesn't seem to be documented; I just found it in one of the examples. It's really fussy about the syntax used. e.g. it works if you ask to prove it in both directions with <=>, but if you ask for e.g. just => then it fails! Also, it requires <<Recv>>_vars <=> Recv rather than <<Recv>>_vars = Recv, though you can prove the first from the second. The ENABLEDaxioms_test.tla test file is useful for getting the syntax right.

See example.tla for the full proof.

Multi-step liveness

The above proof was easy because the Recv step gets us to our goal immediately. It's more complicated if we only say that each Recv step reads some of the buffer:

Recv ==
  \E n \in 1..BufferUsed :
    /\ Got' = Got + n
    /\ UNCHANGED Sent

The trick is to define a distance metric:

Dist(n, i) == Sent >= n /\ n - Got <= i

Dist(n, i) says we are within i bytes of having received the first n bytes. Then a simple inductive argument will do:

<1> DEFINE R(j) == Dist(n, j) ~> Dist(n, 0)
<1>1 R(0) BY PTL
<1>2 ASSUME NEW i \in Nat, R(i) PROVE R(i + 1)
    <2> Dist(n, i+1) ~> Dist(n, i) BY PTL, Progress
    <2> Dist(n, i) ~> Dist(n, 0) BY <1>2
    <2> QED BY PTL
<1> \A i \in Nat : R(i)
    <2> HIDE DEF R
    <2> QED BY NatInduction, Isa, <1>1, <1>2

Here, R(i) says that the algorithm works if we start at most i bytes away from our goal. R(0) is obviously true, and we show that R(i) => R(i+1). By induction, R is true for any distance. Notice again the use of HIDE to avoid confusing the solvers with temporal logic.

Then we just need to show Progress: we eventually get closer to our goal (Dist(n, i+1) ~> Dist(n, i)). That can be proved from Recv in a similar way to before. See example.tla for the full proof.

The real protocol

I've now updated vchan.tla to prove liveness for the real system. It was more complicated than the example above (because it also looks at the data being sent, has many steps, models sending interrupts, etc), but the basic approach is the same.

Note: In the real system, Sent and Got are the actual messages, so the property talks about their lengths. Sent is what the sending application has asked the vchan library to transmit (even if it hasn't been written to the shared buffer yet), and Got is what the receiving application has received from its vchan library.

Ideally, we'd like to prove something like this:

Availability ==
  \A x \in Nat :
    Len(Sent) = x ~> Len(Got) >= x

However, that doesn't work because the sender (and receiver) can decide to shut down the connection at any moment. If the sender closes the connection before transmitting all the data then obviously it might not be received. So it's actually defined like this:

(* Any data that the write function reports has been sent successfully
   (i.e. data in Sent when it is back at "sender_ready") will eventually
   be received (if the receiver doesn't close the connection).
   In particular, this says that it's OK for the sender to close its
   end immediately after sending some data.
   Note that this is not currently true of the C implementation. *)
Availability ==
  \A x \in Nat :
    x = Len(Sent) /\ SenderLive /\ pc[SenderWriteID] = "sender_ready"
      ~> \/ Len(Got) >= x
         \/ ~ReceiverLive

Note: SenderLive and ReceiverLive are badly named. SenderLive = FALSE means that the sender has written all data successfully and there is nothing more to come. ReceiverLive = FALSE means that the receiver has decided to terminate the connection early, and probably indicates an error (think EPIPE).

Although TLAPS couldn't prove liveness properties back in 2018, I had got most of the way there:

I'd defined ReadLimit as:

The number of bytes that the receiver will eventually get without further action from the sender (assuming the receiver doesn't decide to close the connection).

and likewise WriteLimit as:

The number of bytes that the sender will eventually send without further action from the other processes or the client application, assuming the connection isn't closed by either end.

I'd proved some useful invariants about them. For example:

(* Whenever the sender is blocked or idle, the receiver can read everything in
   the buffer without further action from any other process. *)
THEOREM ReadAllIfSenderBlocked ==
  ASSUME I, SenderLive, ReceiverLive,
         \/ pc[SenderWriteID] = "sender_ready"
         \/ pc[SenderWriteID] = "sender_blocked" /\ ~SpaceAvailableInt
  PROVE  ReadLimit = Len(Got) + Len(Buffer)

(* Whenever the receiver is blocked, the sender will fill the buffer (or write everything
   it wants to write) without further action from any other process. *)
THEOREM WriteAllIfReceiverBlocked ==
  ASSUME I, SenderLive, ReceiverLive,
         pc[ReceiverReadID] = "recv_await_data", ~DataReadyInt
  PROVE  WriteLimit = Len(Got) + Min(BufferSize, Len(Buffer) + Len(msg))

If ReadLimit and WriteLimit accurately predict what will happen then these properties are good evidence that the protocol works:

WriteLimit predicts that the sender will be able to fill the buffer initially.
If the receiver doesn't read it then the buffer will become full and the sender will block, at which point ReadAllIfSenderBlocked predicts all the data will be read.
If the receiver runs out of data then it will then eventually block, at which point, WriteAllIfReceiverBlocked predicts more data can be written.

I'd used TLC to check that both predictions were correct (on small models), but I hadn't been able to prove them with the old version of TLAPS.

Proving ReadLimit

The actual proof followed the same outline as the example above. The main extra bit was showing that if we haven't yet read up to the ReadLimit then the receiver process would eventually reach the code that did the read, and with the belief that there was at least one byte available. Mixing BY PTL and regular solvers was particularly annoying here and required stating lots of obvious facts like PC \in {"recv_got_len"} => PC = "recv_got_len" because PTL doesn't understand sets, etc.

It seems a bit strange to have to prove that each line of code leads to the next. That's something you can mostly just assume when writing software, but in TLA you need to be explicit. The only interesting bits were the usual things about showing code terminates:

For loops, show that they will make progress each iteration.
Show the code doesn't crash (e.g. no buffer overflow).
Show that any explicit await will eventually resume.

To help with the proofs I defined RSpec as a variant of Spec:

RSpec ==
  /\ []I
  /\ [][Next]_vars
  /\ []ReceiverLive
  /\ WF_vars(ReceiverRead)

Instead of being true only at the start (like Spec, which uses Init), this only requires our invariant ([]I) to be true. Things proved using RSpec are therefore true from any point in the protocol, which lets them be reused in other proofs.

I was able to show that ReadLimit's prediction will come true:

ReadLimitEventually_prop(n) == RSpec /\ ReadLimit >= n ~> Len(Got) >= n
LEMMA ReadLimitEventually ==
  ASSUME NEW n \in Nat
  PROVE  ReadLimitEventually_prop(n)

and from that, rather magically:

(* When the sender is waiting, the receiver will get everything sent. *)
ReaderLiveness_prop(n) ==
     /\ RSpec
     /\ SenderLive
     /\ \/ pc[SenderWriteID] = "sender_ready"
        \/ pc[SenderWriteID] = "sender_blocked"
     /\ BytesTransmitted >= n \* If we've sent n bytes in total...
     ~> Len(Got) >= n         \* then at least n bytes will eventually be received.
LEMMA ReaderLiveness ==
  ASSUME NEW n \in Nat
  PROVE ReaderLiveness_prop(n)

That eliminates all mention of ReadLimit, and expresses the property in terms meaningful to the sender.

Proving Availability

I was expecting to have to prove WriteLimit was correct too. But it turned out I didn't. Availability only said that the receiver would receive data that had been transmitted; it never claimed that the sending process would succeed in doing that! It's right there in the comment even:

(* Any data that the write function reports has been sent successfully ... *)
Availability ==
  \A x \in Nat :
    x = Len(Sent) /\ SenderLive /\ pc[SenderWriteID] = "sender_ready"
      ~> \/ Len(Got) >= x
         \/ ~ReceiverLive

Well, this is pretty bad. The point of proofs is that instead of having to check hundreds of lines of C, or dozens of lines of TLA actions, we only have to check that the definition of Availability is what we want. My definition of Availability is still true if the sender blocks forever waiting for more space in the buffer. In fact, if I change the definition of the sender process so that the application asking for data to be sent is the only possible action, then TLC reports that Availability still holds (which is true). The original blog post appeared on Reddit, Hacker News and Lobsters, and nobody else seems to have spotted this either.

Though in fairness to my past self, I did mention checking an earlier version of Availability that did require that it works end-to-end (but didn't make any claims about what happens on shutdown), and I had also tested WriteLimit, so I'm not really worried that the system doesn't work.

And so, having learnt how to do liveness proofs, which was my real goal, I decided to stop here.

Proving WriteLimit

But proof assistants are addictive, and I found myself continuing anyway. The proof of the correctness of WriteLimit was very similar to the one for ReadLimit and I got it done much faster now I knew more about how to do these proofs.

However, I also wanted to fix Availability to something useful and updated it to this:

(* We can't ensure Availability if the sender shuts down
   before it has finishing writing the message to the buffer,
   or if it tries to send another message after shutting down. *)
CleanShutdownOnly == Len(msg) = 0 \/ SenderLive

Availability ==
  /\ []CleanShutdownOnly
  /\ []ReceiverLive
  => \A n \in Nat :
        Len(Sent) = n ~>
          Len(Got) >= n

(msg is the data yet to be transmitted)

That required strengthening some of the existing proofs to talk about clean shutdowns, not just the connection never being closed, but it was fairly straight-forward.

End-to-end liveness

I used the ReadLimit and WriteLimit lemmas to prove some things about the system as a whole, e.g.

(* If WriteLimit says we'll transmit some data, it will also be received:
   1. If WriteLimit says we'll transmit some data, then we will.
   2. Eventually the sender will be at a safe point after that.
   3. Once the sender is at a safepoint, the receiver will get all the data. *)
EndToEndLive_prop(n) == WriteLimit >= n ~> Len(Got) >= n
LEMMA EndToEndLive ==
  ASSUME NEW n \in Nat, []WSpec, []RSpec, []CleanShutdownI
  PROVE  EndToEndLive_prop(n)

(* If we've received n bytes but wanted to send more, then eventually
   there will be space to send more. *)
LEMMA EventuallySpace ==
  ASSUME NEW n \in Nat,
         []WSpec, []RSpec, []CleanShutdownI,
         [](Len(Got) >= n), [](Len(Sent) > n)
  PROVE  <>(WriteLimit > n)

(* If the application wants to send n bytes then eventually WriteLimit will
   predict that we will send them. *)
LEMMA SufficientSpace ==
  ASSUME NEW n \in Nat, []WSpec, []RSpec, []CleanShutdownI
  PROVE  Len(Sent) >= n ~> WriteLimit >= n

and was then finally able to prove my improved version of Availability:

THEOREM Spec => Availability

See vchan.tla for the final version with the proofs.

Work-arounds for bugs

The biggest problem I had was the bug when mixing forall and temporal formulas, until I found the work-around above. Here are some other problems I hit and their solutions:

SUFFICES doesn't always generalise

This doesn't work:

Foo == [](Sent = 4)
THEOREM Foo => [](Sent >= 0)
<1> SUFFICES ASSUME Foo PROVE [](Sent >= 0)
    BY PTL DEF Foo
<1> Sent = 4 => Sent >= 0 BY DEF Foo
<1> QED BY PTL DEF Foo  \* Fails!

Even though the SUFFICES used BY DEF Foo, Foo still counts as a time-specific assumption. A work-around is to ASSUME []Foo and then assert Foo in a separate step:

Foo == [](Sent = 4)
THEOREM Foo => [](Sent >= 0)
<1> SUFFICES ASSUME []Foo PROVE [](Sent >= 0)
    BY PTL DEF Foo
<1> Foo BY PTL
<1> Sent = 4 => Sent >= 0 BY DEF Foo
<1> QED BY PTL DEF Foo

CASE with a temporal goal

This doesn't work:

THEOREM ASSUME [](Sent = 0) \/ [](Sent = 4)
        PROVE  [](Sent = 0 \/ Sent = 4)
<1> CASE [](Sent = 0) BY PTL
<1> CASE [](Sent = 4) BY PTL
<1> QED OBVIOUS

It fails to parse, with Non-constant CASE for temporal goal. However, if you replace CASE with the expanded form then it works:

THEOREM ASSUME [](Sent = 0) \/ [](Sent = 4)
        PROVE  [](Sent = 0 \/ Sent = 4)
<1> ASSUME [](Sent = 0) PROVE [](Sent = 0 \/ Sent = 4) BY PTL
<1> ASSUME [](Sent = 4) PROVE [](Sent = 0 \/ Sent = 4) BY PTL
<1> QED OBVIOUS

PTL and primes

This doesn't work:

THEOREM ASSUME [](Sent = 0) PROVE Sent' = 0
<1> QED BY PTL

A work-around:

THEOREM ASSUME [](Sent = 0) PROVE Sent' = 0
<1> SUFFICES (Sent = 0)' OBVIOUS
<1> QED BY PTL

Syntax matters

Things you might expect to be the same after parsing are treated differently. This fails:

THEOREM
  ASSUME [](Sent = 0 \/ Sent = 4)
  PROVE  [](\/ Sent = 0
            \/ Sent = 4)
OBVIOUS

But this passes:

THEOREM
  ASSUME [](Sent = 0 \/ Sent = 4)
  PROVE  [](Sent = 0
            \/ Sent = 4)
OBVIOUS

Other bugs

If keyboard short-cuts stop working, move the focus out of the window and back in again. Sometimes some keys stop working (e.g. Delete) while others continue (e.g. Backspace).
If the toolbox GUI fails to launch the prover due to NullPointerException, try closing the specification and reopening it.
The solver doesn't actually tell you if everything passed; you have to look in the scrollbar for little red regions. If a theorem is folded in the editor, the top-level bit will still show as red if any sub-step fails, but it will NOT appear in the scrollbar! Either expand everything and check by eye, or run tlapm from the command-line as a final check. Note that the bright green Spec Status indicator in the status bar only means that parsing the spec succeeded, not that the proof-checking passed.

Conclusions

Checking the specification with TLC was quick and easy and immediately found the shutdown bug. Actually proving the specification correct using TLAPS was much more difficult, took way longer, and didn't uncovered any additional bugs. It did reveal a mistake in my definition of Availability, but there are surely quicker ways of finding that. So, in terms of value for time spent, I guess it's only worth it for very high-value code (or if, like me, you just like proving stuff for fun).

However, I think it's good to know how to verify things, even if you don't usually do it. It has certainly made me think more carefully about exactly what you need to check to verify liveness.