DISTRIBUTED FILE SYSTEMS
File systems and databases are perhaps the first resources to be distributed and shared in a networked system. In fact, there are commercial products for a long time for local networks, as is the case of NFS. More now, file systems distributed over wide area networks have also been developed.
The management of a distributed file system is supported by two often well differentiated functions: a name or directory service and a file service. It is essential to provide an acceptable performance, so we must reach a compromise between the local availability of information to reduce communication costs (caching and distribution) and consistency, which is reflected in the semantics shown by shared access.
Properties of distributed file systems
A file system is characterized by a set of general properties:
- provides permanent information storage;
- identifies the files in a namespace (normally structured);
- concurrent access is possible from several processes;
- in multi-user systems, it provides access protection.
A distributed file system also has the following properties:
- Transparency in identification. Unique namespace independent of the client.
- Transparency in the location. To allow the mobility of the file from one location to another, a dynamic mapping name-location is required.
- Spaces of structured names, and replication (caching) to avoid bottlenecks.
- Robustness before failures. The server should not be affected by client failures, which is the responsibility of managing the status of clients on the server. On the other hand, the interface offered to the clients must provide as far as possible idempotent operations, which guarantee the correction to repeated invocations (due to suspicion of error) to the server.
- Availability and fault tolerance. They involve some form of replication. One aspect of availability is to allow operation in disconnected mode, which requires caching of entire files.
- Consistency The aim is to keep the semantics of centralized systems as much as possible, for example, to preserve UNIX semantics in the presence of caching or other forms of replication.
- The need for remote authentication implies new protection models, based on credentials instead of access lists.
Examples of distributed file systems
AFS (Andrew File System)
Andrew is the name of a family of distributed file systems for UNIX developed at the University of Carnegie Mellon since 1983. The family components are:
- AFS-1 (1983). Prototype not optimized.
- AFS-2 (1985)
- AFS-3 (1988)
- Coda (1987). Provides operation in disconnected mode.
AFS can be defined in general as a stateless distributed file system that provides session semantics.
The AFS architecture consists of two components, one on the server and one on the client:
- Vice: Server code. From the point of view of the client, Vice is a set of file servers interconnected in a network.
- Venus: Client code that is executed on the operating system in the nodes connected to Vice.
Vice files are seen as integrated into the local system of each client position.
Supports replication of subsets of the file system (
volumes) of infrequent update (AFS-2). This technique is also used for back-ups (by means of read-only copies of a
volume).
AFS-1, AFS-2 y Coda they work with entire files; AFS-3 with 64 Kb blocks. Caching is implemented on the client's disk.
The write policy is
write-on-close. Session semantics are intended to be provided by
callbacks (as of AFS-2). When Vice sends a file to a client, he encloses a
callback promise and takes note of it. When a client closes a modified file, Vice communicates to the clients for whom it keeps a
promise of callback of that file that cancels the promise. The Venus code of a client accessing the local copy of a file with the canceled promise will be responsible for reloading the new version.
AFS-3 introduces important optimizations with respect to AFS-2. Inserts Venus in the kernel (using the VFS interface, such as NFS) and defines server cells to scale the system to WAN.
Define access domains as UNIX. The access rights are defined in a manner compatible with UNIX.
It provides authentication through an authentication server that issues
tokens upon presentation of the user's password in the login to access the file system for a pre-established period (typically 24 hours). Some version of AFS-3 has adopted Kerberos.
Coda is an AFS version designed to provide availability in environments subject to failures, both in the network and in servers, so it is suitable for mobile devices (subject to frequent disconnections) and in general in replicated systems that require fault tolerance.
Manage replicas of
volumes following an optimistic strategy. For this, it is based on two mechanisms:
- It uses version numbers and time vectors for the consistent updating of the replicas (sometimes requires manual intervention).
- Operates on the local cache when it loses the connection until it manages to connect to another server (operating in disconnected mode).
The basic characteristics of Coda are those of AFS-2.
NFS (Network File System)
Introduced by Sun Microsystems in 1985, it was originally developed for UNIX. It was conceived as an open system, which has allowed it to be adopted by all UNIX families and by other operating systems (VMS, Windows), becoming a de facto standard in LANs. NFS has evolved a lot and the current Version 4 has little to do with the previous ones, since it includes status and the possibility of implementation in WAN.
The servers
export directories. To make a directory exportable, the path is included in a specific configuration file. Clients
mount the exported directories, and these are seen in the client completely integrated into the file system. The assembly is executed in the booting of the operating system, or on demand when a file is opened through an additional NFS service, the
automounter. The operations on files and the requests to assemble are attended by separate daemon processes on the server (
nfsd and
mountd respectively).
NFS servers are stateless, which avoids having to deal with client failures on the server. Thanks to the fact that most operations are idempotent, the management of communication errors in the client is simplified.
The sharing semantics tries to be UNIX, although with some limitation, mainly due to the caching management and its stateless server status. It offers the same UNIX protection model, although, due to the absence of status in the server, the access rights are checked in each access operation to the file instead of just when opening. Initially NFS did not adopt any authentication mechanism. The client interface included in the RPCs the UNIX user identifier, which was checked on the server, which did not prevent the possibility of supplanting the identity of a user by building an RPC outside the one offered by the interface. Currently it is usually combined with authentication systems such as Kerberos.
NFS uses the NIS (
Network Information Server) service to centralize information about the location of servers.
The applications use the UNIX interface. NFS defines an interface for client-server communication, which consists of three protocols:
- Sun's RPC protocol defines the client-server communication format. The data is serialized according to the XDR format. The communication is based on UDP. As of Version 3, TCP is also supported for WAN communication.
- Protocol for mount / unmount directory operations.
- NFS protocol. Procedures for file operations (search, create, read, write, delete, obtain attributes ...).
Maintaining UNIX consistency is problematic. The reduction of the validation period to improve the consistency produces an overload due to the large number of
getattr operations that are carried out.
In principle, the assembly of remote file systems was not transparent (the server must be identified). The
automounter, a utility that allows the dynamic assembly of file systems on demand, improves this aspect.
Due to the lack of status, blocking access to remote files requires an independent mutual exclusion mechanism. In UNIX, a specific server,
lockd, is used.
It is not designed to support server replication. To increase availability, parts of the file system that have to support a very high rate of access can be replicated in a set of servers, provided they are for reading. This is done with the NIS, so that each replica is accessible for reading, but the writing is always done on the master copy and the replicas are updated manually.
In principle, NFS was conceived for local networks of a few dozen nodes, although improvements in LAN technologies and optimizations introduced in the latest versions of NFS support a much larger number of clients. As of Version 3, it allows configurations in WAN networks.
PDF File:
Technologies of distributed file systems - Vega, Ernesto