Ever heard that “clocks are unreliable” or that “you cannot trust timestamps” and wondered what is that all about? Or, have you ever been asked why Google data centres have GPS clocks?
After reading this you will be able to guess the answers. We will briefly describe the issue with clocks and show the “consequences” of it using Redis cache expiration as an example.
Timestamps and local clocks
For hardware reasons machine’s clocks are not accurate: over time quartz crystal oscillator clocks will drift apart.
To keep machines’ clocks in sync a special protocol called Network Time Protocol (NTP) is used. There are command line tools to deal with NTP status and force sync if required - timedatectl / timesyncd in Ubuntu and w32time service in Windows.
As NTP needs to talk to a time source over a network its precision is in milliseconds in local network/tens of milliseconds over the Internet and it heavily depends on the network speed. So even with NTP local clocks will drift apart.
To illustrate the problem, let’s say we put money into an account and withdraw them straight after, and there are two requests which are processed by two different machines S1 and S2:
Now imagine if S1 time got out of sync and S1 clock is running fast:
Because of S1 clocks being ahead of S2 the transactions are processed in different order and this will result in different transaction results (Failure where it used to be Success) and different end state (Current Balance 100).
Timestamping on a client machine rather than the servers would solve the ordering of events from the same client but not the total order or events.
Overall the issue would still exist because client machines timestamps can be even more far apart — after all, you are in full control of your local machine time and can set it to anything.
More realistic example
In practice we can probably get away with the timestamps for processing money transactions as the transactions tend to be not that close together.
If events in a system are minutes apart or an occasional order of events error can be safely fixed later on — it may be ok to disregard clocks being out of sync and the consequences of it.
A practical example is synchronization between replicas with “last write wins” (LWW) conflict resolution.
As you can see from this illustration, LWW and the time drift may lead to lost updates.
The node with “slow”/lagging clock will not be able to overwrite updates done by the “fast” node until NTP kicks in and the time is synchronized. The wider is the time difference between the nodes the more is the probability of a lost update happening.
How do systems deal with unreliable clocks?
A simple example is SAML NotOnOrAfter time skew which may be experienced rather often when trying to implement SSO. To account for local times not being in sync, advice is simply to increase SAML token validity period for identity providers.
A more interesting example is how Redis manages to expire keys on read-only replicas.
- After a specified timeout (TTL) a key should expire on both master and replica nodes.
- Due to unreliable clocks it cannot rely on the time on the replicas being in sync with the master’s time.
Approach 1 — Expire the keys independently on the master and the replicas.
The issue with this approach is that a replica can expire a given key at time T (by replica’s clocks) and then a command from master will come at T+x which will assume the key is still present.
For example, *STORE command may come after the replica decides an involved key has already expired (SUNIONSTORE, SDIFFSTORE, SINTERSTORE).
Approach 2 — Expire the keys on the master only.
This way replicas would never expire any keys but wait for the master node to issue the DEL command for the expired keys.
This makes data consistent between master and replicas but only until there is a network delay or partition and the replicas are disconnected from the master — as replicas do not expire keys themselves they will return stale data.
What Redis actually does is the combination of the above.
Redis replicas wait for the master to send DEL command to expire a key, but at the same time for read queries the replica reports a key as expired when its logically expired according to the replica’s local clocks.
This led to an amusing bug in old versions of Redis (inconsistent behaviour of EXISTS and GET on expired keys on replicas).
Time for some local experimentation. To check how it works we can run a master on port 6379 (Redis default) and a replica on port 6380 and set key “foo” to expire on master.
Then, as it is difficult to emulate a partition when both the master and the replica are running on the same machine, let’s make the master “freeze” via running some slow Lua script:
While master is busy churning through loops in Lua, we can ping the replica for TTL of “foo”:
And run the MONITOR command in another session so that we can see the order of commands against the replica:
As you can see, “foo” expires independently on the replica and only when master is free from the Lua script freeze it sends the “DEL” command to expire “foo” key which has already expired on the replica.
Unreliable clocks and total ordering of the events are only few of the many exciting aspects of distributed systems and hopefully this was a interesting introduction. And now you can guess why would one put a GPS into a data centre.