Time-keeping Woes

Yes, we’re still talking about how Vessel + Wyrd implement something kind of like a Merkle Clock, and yet also arguably better. I wrote that the last post was supposed to be the final one in the series.

And for a general overview, that is true – this article fills in some necessary background and design detail. Call it a bonus, if you’d like.

#Recap

We compared Merkle DAGs to Vessel’s DAG.
The next article then described Vessel’s DAG as no worse than a Merkle DAG.
We then explored Wyrd’s usage of the Vessel DAG, with the conclusion that the DAG provides too coarse-grained an ordering.
Finally, we had an outline for how to provide finer grained time stamps based on monotonic clocks.

Figure: “Hexadecimal clock” by adactio is licensed under CC BY 2.0.

#Hardware Clock, System Timer, Monotonic Clock

So let’s get down to basics for a moment. It’ll be easier to work our way upward from there!

Computers tend to keep time across reboots. That is because they usually contain a hardware or real-time clock (RTC). In commodity hardware, this clock is kept running – and powered – by a battery (CMOS battery) that does little but keep this clock ticking and some BIOS settings alive.

On Linux, you can access this hardware clock with a ioctl() called RTC_RD_TIME. On other systems… let’s spare you the details.

Operating systems know the current time because they rely on two things: the system timer being initialized, and knowing the offset between the system timer and the local time and time zone.

The system timer is little more than a counter that counts clock ticks. The human-readable time is obtained by applying the above offset. That, at least, is the TL;DR version.

When a system contains a hardware clock, the system timer is initialized from that hardware clock. When no RTC is present, network time synchronization or other means must set the correct time.

Monotonic clocks are based off the system timer without applying time/time zone offsets. They’re sort of a straight representation of the values obtained from the RTC…

…except that is not quite true.

RTCs are typically 32 bit counters. We’re all well enough aware that 32 bit clocks can overflow. Worse, RTCs can often be programmed to apply a pre-scaler.

Perhaps they’re not based off the Caesium standard, but they still rely on a frequency of an underlying physical element, such as a quartz crystal. These frequencies are usually much higher than one interval per second, so in order to get second-long ticks, a factor is applied to the frequency (the pre-scaler). In this way, one might get closer to microsecond granularity, but at the cost of a wraparound after a few seconds.¹

So to get from the RTC to the system timer and/or monotonic clocks, the OS has to consider RTC resolution and wraparound, and compensate for both.

To make things worse – yes, they can get worse – is that more complex machines are perfectly capable of translating from the basic RTC counter to an RTC that does this compensation already, which means it also needs to know and update some kind of epoch for the RTC counter.²

So rather than providing straight RTC-based values, monotonic clocks actually start counting from system boot, treat that point in time as zero, and just count upwards.

The upshot of this is twofold:

Yes, monotonic clocks are great for measuring elapsed time, as was required in the previous post.
It’s actually quite hard to have monotonic time keeping across system reboots.

We’ll sort of need both, though. We will mostly rely on the monotonic clock, but for $reasons sometimes need something stable across reboots. The best we can reliably get will be very system dependent, but on commodity hardware and OSes, it may actually be the decidedly non-monotonic gettimeofday().

But we can and will limit the damage that does.

#Monotonic Clock Epochs

So how do we use the monotonic clock, considering it resets every time the system reboots?

The implication of starting at zero every boot is that it’s possible to record two time points with a monotonic clock time stamp of T at very different wall times – the measurements just have to be taken at precisely the same offset after boot.

We can work around that by maintaining a strictly increasing epoch counter. Any monotonic clock offset within the same epoch is directly comparable, and any monotonic clock offset from an earlier epoch happened before a later epoch’s offsets.

The logic for that is relatively simple, and we only have to execute it if we want to record an event. We do not have to care at all about how often the system may have rebooted between events – most that we need to care about is detecting the epoch change.

When recording an event, check first if the current_epoch value is set:

It is? That’s great! Record (current_epoch, monotonic_offset) as the event time stamp. We’re done.
It isn’t? Now we need to do three things:
1. First, read the last recorded epoch from the resource:
  - There is none? You’re on the initial epoch. Set current_epoch = 1.
  - There is one? Also good. Set current_epoch = last_epoch + 1.
2. Record the current monotonic clock value as monotonic_start. All monotonic offsets will be recorded as current_monotonic_clock - monotonic_start, i.e. normalized to the start of the epoch.
3. Record (current_epoch, monotonic_offset) as the event time stamp.

We now have time stamps that can easily be ordered relatively to each other. We even know the delay between subsequent events. And since all nodes share a synchronization point from the Vessel DAG, they can merge event lists flawlessly.

Well… mostly.

#Epochal Events

It’s entirely correct that we do not care how many times the monotonic clock gets reset between one epoch and the next.

But we still need to know how much time elapsed between epochs, because they don’t really mean the same thing between nodes.

Let’s illustrate this with a table of events recorded across two nodes N1 and N2. Here, we have a wall time offset for illustration purposes, and the monotonic epoch and offset relative to a shared synchronization point. For simplicity’s sake we increment the wall time offset by ten seconds each time.

Wall Time	N1	N2
00s	Event 1 (1, 0)	–
10s	–	Event 2 (1, 0)
20s	–	Event 3 (1, 10)
30s	Event 4 (2, 0)	–
40s	–	Event 5 (1, 30)

The table illustrates two problems:

Both nodes record their first event using the same time stamp with an initial epoch of 1, and the offset relative to that epoch of 0. There is no way for either node to understand that one event occurred ten seconds after the other.
Epoch-based ordering would place Event 4 after Event 5, but Event 5 occurred later.

Before I wrote about time elapsed between epochs. One way of looking at that first problem is to realize that there is an implied “epoch 0” at the shared (logical) synchronization point.

So how do we measure time elapsed between two epoch changes?

This, unfortunately, we cannot completely solve unless the system provides a strictly monotonic RTC. In an embedded system where we control the initialization of the system time, we can probably simply record the RTC before time and time zone offsets are applied. Meanwhile in a commodity system, we may perform initial time synchronization and then record the system time.

The point of both approaches is to do this once after reboot, and not continuously. On the assumption that more than one event is recorded per epoch, the precision of the monotonic clock will eventually outweigh the potential imprecision of this epoch delay measurement.

And if a system’s event ordering is critical enough, it’ll always be possible to add a bespoke RTC for this purpose for a fistful of dollars.

#Putting Things Together

Time keeping is hard, y’all!

We mostly record events with a time stamp of (epoch, monotonic offset).
We measure time elapsed between epochs (epoch interval) with a clock sufficiently reliable for that purpose.
By combining epoch interval and monotonic offset, we have a sufficiently correct monotonic time stamp for events from a logical synchronization point.
This means that in a perfectly synchronized Vessel generation, we can correctly merge events, as they all share the same synchronization point.
In a perfectly synchronized resource, we therefore have achieved absolute event ordering.

Unlike Merkle Clocks, we also have another property: we have (not quite accidentally) achieved a logical clock that counts in seconds! So can we relate that to wall time? Did we accidentally achieve proper time synchronization?

No, we did not. But if e.g. the time synchronization based on the space link was applied, we could!

See, the problem we haven’t solved is how we define the logical synchronization point. Intuitively, when a shared ancestor extent is created, we might record the time of that – but there is not necessarily any relation between that system’s time and the time of a system receiving the extent.

We can therefore only record an epoch 0 when this shared ancestor arrives at its destination. Since two receiving nodes may receive this ancestor at different delays after its creation, there is no good way to correlate those epoch 0 events, unless that is somehow supplied from the outside.

So for all intents and purposes, we still have a logical clock. Within one DAG generation, we have wall-time aligned ordering, but from one generation to the next, the time “jumps”.

But it wasn’t the goal to achieve time synchronization anyway. All we needed is a better logical clock. And the logical clock we’ve got is pretty good – it seems to make the Wyrd + Vessel combination ideal for DTN applications, at any rate!

Windows’ GetTickCount() function overflows after 49.7 days. Yay. ↩︎
Windows seems to use this to write the “system time” into BIOS and let the hardware do much of the work. Linux does not. In dual-boot systems, when you set the time on Windows, that can actually mess with Linux’s notion of time. ↩︎