Clock Synchronization Algorithms
Clock synchronization algorithms may be broadly classified as Centralized and Distributed:
Centralized Algorithms
In centralized clock synchronization algorithms one node has a real-time receiver. This node, called the time server node whose clock time is regarded as correct and used as the reference time. The goal of these algorithms is to keep the clocks of all other nodes synchronized with the clock time of the time server node. Depending on the role of the time server node, centralized clock synchronization algorithms are again of two types – Passive Time Sever and Active Time Server.
1. Passive Time Server Centralized Algorithm: In this method each node periodically sends a message to the time server. When the time server receives the message, it quickly responds with a message (“time = T”), where T is the current time in the clock of the time server node. Assume that when the client node sends the “time = ?” message, its clock time is T0, and when it receives the “time = T” message, its clock time is T1. Since T0 and T1 are measured using the same clock, in the absence of any other information, the best estimate of the time required for the propagation of the message “time = T” from the time server node to the client’s node is (T1-T0)/2. Therefore, when the reply is received at the client’s node, its clock is readjusted to T + (T1-T0)/2. 2. Active Time Server Centralized Algorithm: In this approach, the time server periodically broadcasts its clock time (“time = T”). The other nodes receive the broadcast message and use the clock time in the message for correcting their own clocks. Each node has a priori knowledge of the approximate time (Ta) required for the propagation of the message “time = T” from the time server node to its own node, Therefore, when a broadcast message is received at a node, the node’s clock is readjusted to the time T+Ta. A major drawback of this method is that it is not fault tolerant. If the broadcast message reaches too late at a node due to some communication fault, the clock of that node will be readjusted to an incorrect value. Another disadvantage of this approach is that it requires broadcast facility to be supported by the network.
2. Another active time server algorithm that overcomes the drawbacks of the above algorithm is the Berkeley algorithm proposed by Gusella and Zatti for internal synchronization of clocks of a group of computers running the Berkeley UNIX. In this algorithm, the time server periodically sends a message (“time = ?”) to all the computers in the group. On receiving this message, each computer sends back its clock value to the time server. The time server has a priori knowledge of the approximate time required for the propagation of a message from each node to its own node. Based on this knowledge, it first readjusts the clock values of the reply messages, It then takes a fault-tolerant average of the clock values of all the computers (including its own). To take the fault tolerant average, the time server chooses a subset of all clock values that do not differ from one another by more than a specified amount, and the average is taken only for the clock values in this subset. This approach eliminates readings from unreliable clocks whose clock values could have a significant adverse effect if an ordinary average was taken. The calculated average is the current time to which all the clocks should be readjusted, The time server readjusts its own clock to this value, Instead of sending the calculated current time back to other computers, the time server sends the amount by which each individual computer’s clock requires adjustment, This can be a positive or negative value and is calculated based on the knowledge the time server has about the approximate time required for the propagation of a message from each node to its own node.
Centralized clock synchronization algorithms suffer from two major drawbacks:
1. They are subject to single – point failure. If the time server node fails, the clock synchronization operation cannot be performed. This makes the system unreliable. Ideally, a distributed system, should be more reliable than its individual nodes. If one goes down, the rest should continue to function correctly.
2. From a scalability point of view it is generally not acceptable to get all the time requests serviced by a single time server. In a large system, such a solution puts a heavy burden on that one process.
Distributed Algorithms
We know that externally synchronized clocks are also internally synchronized. That is, if each node’s clock is independently synchronized with real time, all the clocks of the system remain mutually synchronized. Therefore, a simple method for clock synchronization may be to equip each node of the system with a real time receiver so that each node’s clock can be independently synchronized with real time. Multiple real time clocks (one for each node) are normally used for this purpose. Theoretically, internal synchronization of clocks is not required in this approach. However, in practice, due to inherent inaccuracy of real-time clocks, different real time clocks produce different time. Therefore, internal synchronization is normally performed for better accuracy. One of the following two approaches is used for internal synchronization in this case.
1. Global Averaging Distributed Algorithms: In this approach, the clock process at each node broadcasts its local clock time in the form of a special “resync” message when its local time equals T0+iR for some integer I, where T0 is a fixed time in the past agreed upon by all nodes and R is a system parameter that depends on such factors as the total number of nodes in the system, the maximum allowable drift rate, and so on. i.e. a resync message is broadcast from each node at the beginning of every fixed length resynchronization interval. However, since the clocks of different nodes run slightly different rates, these broadcasts will not happen simultaneously from all nodes. After broadcasting the clock value, the clock process of a node waits for time T, where T is a parameter to be determined by the algorithm. During this waiting period, the clock process records the time, according to its own clock, when the message was received. At the end of the waiting period, the clock process estimates the skew of its clock with respect to each of the other nodes on the basis of the times at which it received resync messages. It then computes a fault-tolerant average of the next resynchronization interval.
2. The global averaging algorithms differ mainly in the manner in which the fault-tolerant average of the estimated skews is calculated. Two commonly used algorithms are: 1. The simplest algorithm is to take the average of the estimated skews and use it as the correction for the local clock. However, to limit the impact of faulty clocks on the average value, the estimated skew with respect to each node is compared against a threshold, and skews greater than the threshold are set to zero before computing the average of the estimated skews. 2. In another algorithm, each node limits the impact of faulty clocks by first discarding the m highest and m lowest estimated skews and then calculating the average of the remaining skews, which is then used as the correction for the local clock. The value of m is usually decided based on the total number of clocks (nodes).