yvsou.com

A REVIEW OF SYNCHRONIZATION IN DISTRIBUTED SYSTEMS

INTRODUCTION

The advantages of the distributed systems are so attractive that there is a gradual shift from the centralized system era to the distributed systems era. The distributed systems are faster and cheaper when compared to the centralized systems. In a distributed system, a set of processors, each with its own internally built-in hardware clock, communicates by message transmission. But they do not have access to a central global clock. The hardware clock of each processor tends to drift apart even if all the processor clocks are set to a common time value. The drifting of these processors can be due to instabilities inherent in source oscillators and environmental conditions such as temperature, air circulation and mechanical stresses. For real time software applications and related processes, highly accurate and synchronized time is a necessity. Clock inaccuracy can cause a number of problems. Even if it is a difference of a minute or two, the outcomes may be unacceptable for the application.

A distributed system is designed to realize some synchronized behavior, especially in real-time processing in areas like factories, aircraft, space vehicles, and military. Synchronization of individual clocks becomes very important in case of certain hard and risky real time applications like, where predictable performance is of major concern, one need to preserve a total logical or temporal scheduling of the tasks in the system.

Clock inaccuracies occur due to certain instabilities inherent in source oscillators and environmental conditions such as temperature, air circulation and mechanical stresses. The clocks in the different nodes need to be synchronized to limit the inaccuracies and hence implement the objectives of distributed system in an efficient manner. Hence, clock skew and drifts which forms the major source of clock inaccuracy needs to be monitored continuously. In certain applications it is not just enough to synchronize the various processes but also the various events that constitute them. This is called intraprocess synchronization. Intraprocess concurrency is captured by relation affects or causally affects. Bit matrix clocks and hierarchical clocks which evolved after the logical and vector clocks of Lamport capture affects relation between events of process. However both have a major disadvantage in terms of increased storage and communication overhead. Difference clocks captures intraprocess and interprocess concurrency, at the same time with reduced storage and communication overheads.

SYNCHRONIZATION IN DISTRIBUTED SYSTEMS

Distributed System is a collection of computers connected via the high speed communication network. In the distributed system, the hardware and software components communicate and coordinate their actions by message passing. Each node in distributed systems can share their resources with other nodes. So, there is need of proper allocation of resources to preserve the state of resources and help coordinate between the several processes. To resolve such conflicts, synchronization is used. Synchronization in distributed systems is achieved via clocks.

The physical clocks are used to adjust the time of nodes. Each node in the system can share its local time with other nodes in the system. The time is set based on UTC (Universal Time Coordination). UTC is used as a reference time clock for the nodes in the system.

The clock synchronization can be achieved by 2 ways: External and Internal Clock Synchronization.

External clock synchronization is the one in which an external reference clock is present. It is used as a reference and the nodes in the system can set and adjust their time accordingly.

Internal clock synchronization is the one in which each node shares its time with other nodes and all the nodes set and adjust their times accordingly.

There are 2 types of clock synchronization algorithms: Centralized and Distributed.

Centralized is the one in which a time server is used as a reference. The single time server propagates its time to the nodes and all the nodes adjust the time accordingly. It is dependent on single time server so if that node fails, the whole system will lose synchronization. Examples of centralized are- Berkeley Algorithm, Passive Time Server, Active Time Server etc.

Distributed is the one in which there is no centralized time server present. Instead the nodes adjust their time by using their local time and then, taking the average of the differences of time with other nodes. Distributed algorithms overcome the issue of centralized algorithms like the scalability and single point failure. Examples of Distributed algorithms are – Global Averaging Algorithm, Localized Averaging Algorithm, NTP (Network time protocol) etc.

CLOCK SYNCHRONIZATION

Logical clocks

Lamport's algorithm

Lamport realized three important things about this problem of distributed systems and clocks

processes that don't interact don't matter (need a common clock)

event ordering is key, rather than true time

absoute correctness is less important than consistency (logical versus physical clocks)

Lamport developed a relation to describe the algorithm for making an absolute event ordering in a distributed system

a->b (a happens-before b)

If a and b are in the same process and a occurs before b, then a->b is true.

If a is a message send, and b is the message receive, then a->b, regardless of the relationship between clocks.

If a->b and b->c, then a->c (transitive property)

If events a and b are in two separate processes that don't communicate (even indirectly through a third party), then you can't say a->b or b->a. So we call a and be concurrent events.

The distributed clock time is C(a) for an event a. All the CPUs in the system agree on C(a), and if a->b, then C(a) < C(b)

The clock time C(a) always advances forward. It can skip ahead, but never go in backwards.

Example

three machines, three clocks, all start at time 0 synchronized, but 0 runs slower, 2 runs faster than 1

message x sent from 0 to 1 - x arrives after it is sent ok

message y sent from 1 to 2 - y arrives afer it is sent ok

message a sent from 2 to 1 - a arrives before it is sent problem

message b sent from 1 to 0 - b arrives before it is sent problem

messages a and b take negative time to arrive

Lamport's algorithm fixes the problem, allows for unambiguous and correct ordering of events

Include send time of message in message.

If message arrives before it is sent, advance receiver's clock to one tick more than send time of message.

Insure that clocks are advanced by at least one tick between any two message sends from the same machine.

Result

C(send) < C(receive) for all messages

This allows CPUs to exchange messages and know which one happened first.

Lamport's algorithm thus gives you a distributed logical clock for ordering events. Many algorithms rely on such a logical clock.

Physical clocks

Real-time clocks are synchronized to the official time of the world. The official time is based on the length of the day, as viewed by when the sun makes a complete transit (apparent) of the Earth. This time is lengthening as the world slows. The standard based on observations of the sun is known as Greenwich Mean Time.

This is currently measured by counting the number of times it takes the Cesium 133 atom to make 9,192,631,770 transitions. Fifty or so atomic clocks are averaged to arrive at the TAI (International Atomic Time) which is the number of seconds since midnight January 1, 1958.

Leap seconds are periodically inserted to keep TAI in synch with the actual solar day. The result is the official time standard for the world, UTC (Universal Coordinated Time).

Distributing the UTC is tricky. Some approaches are

shortwave radio broadcast, +- 10ms

satellite service, +- 0.5ms

telephone, +- 5ms

Synchronized real-time clocks

Suppose you can get one of your computers to syncrhonize on UTC. How do you share this real clock with other computers you manage?

If you can synchronize just once then you're going to soon see differences as the clocks diverge do to skew. What you want is for the clock for machine p, Cp(t) to be the right time.

Cp(t) = t for all p and t

which means that dC/dt = 1

Modern timers are only accurate to about 1 part in 10-5 which means the worst case fast timer clock is slow by 36ms per hour and the worst case fast timer is fast by 36ms per hour (about 0.9 seconds per day - I've known computers that are much worse).

To guarantee that the maximum drift is never bigger than some threshold value, the clocks must periodically be re-synchronized.

Cristian's algorithm

Assumes that a certain machine, the time server, is synchronized to UTC in some fashion. Periodically all other clocks are synchronized with the time server.

The other machines send a message to the time server, which responds with CUTC in a response, as fast as possible. The trouble with just setting a client's clock to this value is that you don't want time to move backwards which would happen to clients with fast clocks. So instead the value by which the client's clock is updated on each timer interrupt is changed. For example, if the interrupt normally increments the clock by 10ms, then a fast clock could be synchronized to the right time by only incrementing every 9ms.

The other problem with this algorithm is that the CUTC in the response is older by the two-way propagation delay of the message. This is corrected for by measuring the time it takes to send the request and receive the response, assume the delay is symmetrical, and add half the measured time to the CUTC response. Further refinements are to estimate the time the server takes in handling the message.

An improvement is to send several message and make estimates from them and take their average.

USES OF SYNCHRONIZED CLOCKS

What can you do if you have synchronized real-time clocks accurate to within a few milliseconds?

At-most-once message delivery semantics

Most strategies to insure at-most-once semantics give a unique id to each transaction or message. The server then keeps track of which messages it has seen by id. If the server crashes the list of messages received may be lost, so duplicates may be accepted. Even without crashes, how long should the server retain message ids?

With a global clock you can add a timestamp means something to both the sender and the receiver. Each message has a connection identifier and a timestamp. If a sender has to resend a message it uses the same timestamp for the duplicate as for the original. In effect the timestamp is the unique message id.

The server keeps track for each connection the value of the last timestamp it has seen. Messages arriving with a timestamp earlier than the most recently received timestamp are rejected as duplicates.

The server throws away timestamps when they are older than

G = CurrentTime - MaxLifetime - MaxClockSkew

No messages can arrive for known connections that have a timestamp older than G, since MaxLifetime is the maximum amount of time a message can bounce around the network.

Messages arriving for previously unknown connections are rejected if they are older than G, and accepted if they are younger than G.

Every deltaT the current value of G is written to disk (non-volatile storage). When a server crashes and reboots it reads G from the disk and adds deltaT. Some new messages may be rejected, but under no circumstances will duplicates be accepted. Note that this is what is desired "at-most-once" not what the application programmer might want.

Clock-based filesystem cache consistency

If a distributed filesystem allows copies of files to be cached locally in clients for performance reasons, then a consistency problem arises if multiple copies are outstanding and written to. To eliminate this problem only one client can have a file "checked out" if it is writing. Copies of the file that are in other clients must be invalidated first. Some of those copies may be very old and no longer active, but still the inactivation must be done.

To eliminate this extra overhead of invalidation, the server gives out cached copies with a lease. When the lease expires the copy is automatically invalidated in the cache. The client may extend the lease before it expires if it is still using the file. If the lease expires but the file is in the cache, the client sends a request for the file with the timestamp of the cache copy. If the cached copy is still valid (hasn't been modified since being copied) the lease is returned but the file doesn't have to be transferred.

When multiple copies are outstanding the server has to ask for invalidation of the lease if one client wishes to write to the file. If a client has crashed there will be no response to this invalidation request. In this case the server can simply wait until it knows that the lease has expired. In this way the server doesn't have to distinguish between a dead client and one which is simply slow to respond.

REFERENCES

Lamport, L.: Time. clocks and the ordering of events in a distributed system. Communications of the ACM 21(7), 558–564 (1978) zbMATHCrossRefGoogle Scholar

Vaidehi, S., Ram, D.J., Shukla, A.: Difference clocks-A new scheme for logical time in distributed systems. IEE Proc.-Comput. Digit. Tech. 143(6), 426–430 (1996) CrossRefGoogle Scholar

Distributed Systems: Clock Synchronisation, UTC, Clock drift and skew (2010), http://www.krzyzanowski.org/rutgers/lectures/l-clocks.html

K.G. Shin and P. Ramanathan, “Transmission Delays in Hardware Clock Synchronization,”IEEE Trans. Computers, Vol. c-37, NO. 11, NOV. 1988, pp. 1,465-1,467.

K.G. Shin and P. Ramanathan, “Clock Synchronization of a Large Multiprocessor System in the Presence of Malicious Faults,” IEEE Trans. Computers, Vol. C-36, No. 1, Jan. 1987, pp. 2-12.

A Review of Synchronization in Distributed Systems Version 0
👤 Author: by kabirssulaimangmailcom 2020-03-31 11:48:08

Reversion History