Brief Introduction
In distributed system, there are many technological means of communication like;
- Using the Network Protocol Stack
- Remote Procedure Call - Remote Procedure Call – RPC
- Remote Object Invocation – Java
- Remote Method Invocation
- Message Queuing Services – Sockets
- Stream-oriented Services
In this write-up, I am going to be specific on using the network protocol stack as a mechanism in distributed system.
Communication in Distributed Software Development is an area of study that considers communication processes and their effects when applied to software development in a globally distributed development process. The importance of communication and coordination in software development is widely studied [1] and organizational communication studies these implications at an organizational level. This also applies to a setting where teams and team members work in separate physical locations.
Decades of experimentation" with parallel and distributed computing has established the importance of handling real-world applications. Enormous amount of research is being invested into exploring the nature of a general, cost-effective, scalable yet powerful computing model that will meet the computational and communication requirements of the wide range of applications that comprise the Grand Challenges (climate modeling, fluid turbulence, pollution dispersion, human genome, ocean circulation, quantum chromodynamics, semiconductor modeling, superconductor modeling, etc.).
Networks and Distributed Systems
Until the 1980s, computer systems were large and expensive, and there was no meaningful way to link them together. The 1980s, however, saw two important developments - powerful microprocessors and higher-speed networks (LANs/WANs now up and over a gigabit in speed). More recent developments including research are scalable cluster-based computer (e.g. Beowulf) and peer-to-peer networks (e.g., BitTorrent, SETI@Home, etc). The result is networked computers.
It is now feasible to design and implement computing systems with large numbers of networked computers. Advantages to this include shared resources, such as hardware (file stores, printers, CPU, etc) and software (files, databases, etc), to speed up computation (partitioned computations can have parts executing concurrently within a system), reliability (if one site fails, others may be able to continue) and easier communication.
Network Protocols
Seperate machines are connected via a physical network, and computers pass messages to each other using an appropriate protocol. Networking operating system services interface to the network via the kernel and contains complex algorithms to deal with transparency, scheduling, security, fault tolerance, etc. They provide peer-to-peer communication services and user applications directly access the services, or the middleware.
Middleware is a high level of OS services. Protocol is a "well-known set of rules and forats to be sued for communications between processes in order to perform a given task". Protocols specify the sequence and format of messages to be exchanged.
We can consider a protocol that allows the receiver to request a re-transmission across a data-link (on message error):
Protocol Layers
Protocol layers exist to reduce design complexity and improve portability and support for change. Networks are organised as series of layers or levels each built on the one below. The purpose of each layer is to offer services required by higher levels and to shield higher layers from the implementation details of lower layers.
Each layer has an associated protocol, which has two interfaces: a service interface - operations on this protocol, callable by the level above and a peer-to-peer interface, messages exchanged with the peer at the same level.
No data is directly transferred from layer n on one machine to layer n on another machine. Data and control pass from higher to lower layers, across the physical medium and then back up the network stack on the other machine.
As a message is passed down the stack, each layer adds a header (and possibly a tail such as a checksum, although this is most common at the bottom of the stack) and passes it to the next layer. As a message is received up the stack, the headers are removed and the message routed accordingly.
As you move down the stack, layers may have maximum packet sizes, so some layers may have to split up the packet and add a header on each packet before passing each individual packet to the lower layers.
Protocol Layers
Protocol layers exist to reduce design complexity and improve portability and support for change. Networks are organised as series of layers or levels each built on the one below. The purpose of each layer is to offer services required by higher levels and to shield higher layers from the implementation details of lower layers.
Each layer has an associated protocol, which has two interfaces: a service interface - operations on this protocol, callable by the level above and a peer-to-peer interface, messages exchanged with the peer at the same level.
No data is directly transferred from layer n on one machine to layer n on another machine. Data and control pass from higher to lower layers, across the physical medium and then back up the network stack on the other machine.
As a message is passed down the stack, each layer adds a header (and possibly a tail such as a checksum, although this is most common at the bottom of the stack) and passes it to the next layer. As a message is received up the stack, the headers are removed and the message routed accordingly.
As you move down the stack, layers may have maximum packet sizes, so some layers may have to split up the packet and add a header on each packet before passing each individual packet to the lower layers.
Some protocol design issues include identification (multiple applications are generating many messages and each layer needs to be able to uniquely identify the sender and intended recipient of each message), data transfer alternatives (simplex, half and full duplex), EDC error control (error detection and correction - depends on environment e.g., SNR and application requirements), message order preservation and swamping of a slow receiver by a fast sender (babbling idiot problem, when a failed node swamps a databus with nonsense).
We can also consider two types of protocols, connection vs. connectionless. Connection orientated services are where a connection is created and then messages are received in the order they are sent, a real world analogy is the POTS. Connectionless services are where data is sent independently. It is dispatched before the route is known and it may not arrive in the order sent, a real world analogy here is the postal system.
OSI Reference Model
The ISO Open Systems Interconnections (OSI) reference model deals with connecting open networked systems. The key principes of the OSI are:
1. layers are created when different levels of abstraction are needed
2. the layer should perform a well defined function
3. the layer function should be chosen with international standards in mind
4. layer boundaries should be chosen to minimise information flow between layers
5. the number of layers should be: large enough to prevent distinct functions being thrown together, but small enough to prevent the whole architecture being unwieldy
Common socket operations include:
- socket(domain, type, procedure) - create a socket
- bind(s, port) - associate the socket s with a specified port
- listen(port, backlog) - server listens for connection requests on a port
- connect(s, serverport) - client connects to a server using socket s
- accept(s, clientport) - server accepts connection
- sendto(s, data, clientport) - client/server sends data to a port
- receive(ip, port) - client/server receives data from a port