A REVIEW OF DISTRIBUTED FILE SYSTEMS (DFSs)
INTRODUCTION
The need to share resources in a commuter system arises due to economics or the nature of some applications. In such cases, it necessary to facilitate sharing long-term storage devices and their data. This short review discusses Distributed File Systems (DFSs) as the means of sharing storage space and data.
A file system is a subsystem of an operating system whose purpose is to provide long term storage. It does so by implementing files-named objects that exist from their explicit creation until their explicit destruction and are immune to temporary failures in the system. A DFS is a distributed implementation of the classical timesharing model of a file system, where multiple users share files and storage resources. The UNIX time-sharing file system is usually regarded as the model [Ritchie and Thompson 1974]. The purpose of a DFS is to support the same kind of sharing when users are physically dispersed in a distributed system. A distributed system is a collection of loosely coupled machines-either a mainframe or a workstation-interconnected by a communication network. Unless specified otherwise, the network is a local area network (LAN). From the point of view of a specific machine in a distributed system, the rest of the machines and their respective resources are remote and the machine’s own resources are local.
DISTRIBUTED FILE SYSTEMS (DFSs)
A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user's computer while the data is being processed and is then returned to the server.
Ideally, a distributed file system organizes file and directory services of individual servers into a global directory in such a way that remote data access is not location-specific but is identical from any client. All files are accessible to all users of the global file system and organization is hierarchical and directory-based.
Since more than one client may access the same data simultaneously, the server must have a mechanism in place (such as maintaining information about the times of access) to organize updates so that the client always receives the most current version of data and that data conflicts do not arise. Distributed file systems typically use file or database replication (distributing copies of data on multiple servers) to protect against data access failures.
Sun Microsystems' Network File System (NFS), Novell NetWare, Microsoft's Distributed File System, and IBM/Transarc's DFS are some examples of distributed file systems.
To explain the structure of a DFS, we need to define service, server, and client [Mitchell 19821.
A
service is a software entity running on one or more machines and providing a particular type of function to a priori unknown clients. A server is the service software running on a single machine.
A
client is a process that can invoke a service using a set of operations that form its client interface. Sometimes, a lower level interface is defined for the actual cross-machine interaction.
When the need arises, we refer to this interface as the inter machine interface. Clients implement interfaces suitable for higher level applications or direct access by humans.
Using the above terminology, we say a file system provides file services to clients. A client interface for a file service is formed by a set of file operations. The most primitive operations are Create a file, Delete a file, Read from a file, and Write to a file. The primary hardware component a file server controls is a set of secondary storage devices (i.e., magnetic disks) on which files are stored and from which they are retrieved according to the client’s requests. We often say that a server, or a machine, stores a file, meaning the file resides on one of its attached devices. We refer to the file system offered by a uniprocessor, timesharing operating system (e.g., UNIX 4.2 BSD) as a conventional file system.
A DFS is a file system, whose clients, servers, and storage devices are dispersed among the machines of a distributed system. Accordingly, service activity has to be carried out across the network, and instead of a single centralized data repository there are multiple and independent storage devices. As will become evident, the concrete configuration and implementation of a DFS may vary. There are configurations where servers run on dedicated machines, as well as configurations where a machine can be both a server and a client. A DFS can be implemented as part of a distributed operating system or, alternatively, by a software layer whose task is to manage the communication between conventional operating systems and file systems. The distinctive features of a DFS are the multiplicity and autonomy of clients and servers in the system.
DFSs SERVICES
A distributed file system provides the following types of services:
Storage service: Allocation and management of space on a secondary storage device thus providing a logical view of the storage system.
True file service: Includes file-sharing semantics, file-caching mechanism, file replication mechanism, concurrency control, multiple copy update protocol etc.
Name/Directory service: Responsible for directory related activities such as creation and deletion of directories, adding a new file to a directory, deleting a file from a directory, changing the name of a file, moving a file from one directory to another etc
DFSs DESIRABLE FEATURES
- Transparency
- Structure transparency: Clients should not know the number or locations of file servers and the storage devices. Note: multiple file servers provided for performance, scalability, and reliability.
- Access transparency: Both local and remote files should be accessible in the same way. The file system should automatically locate an accessed file and transport it to the client’s site.
- Naming transparency: The name of the file should give no hint as to the location of the file. The name of the file must not be changed when moving from one node to another.
- Replication transparency: If a file is replicated on multiple nodes, both the existence of multiple copies and their locations should be hidden from the clients.
- User mobility: Automatically bring the user’s environment (e.g. user’s home directory) to the node where the user logs in.
- Performance: Performance is measured as the average amount of time needed to satisfy client requests. This time includes CPU time + time for accessing secondary storage + network access time. It is desirable that the performance of a distributed file system be comparable to that of a centralized file system.
- Simplicity and ease of use: User interface to the file system be simple and number of commands should be as small as possible.
- Scalability: Growth of nodes and users should not seriously disrupt service.
- High availability: A distributed file system should continue to function in the face of partial failures such as a link failure, a node failure, or a storage device crash.
- A highly reliable and scalable distributed file system should have multiple and independent file servers controlling multiple and independent storage devices.
- High reliability: Probability of loss of stored data should be minimized. System should automatically generate backup copies of critical files.
- Data integrity: Concurrent access requests from multiple users who are competing to access the file must be properly synchronized by the use of some form of concurrency control mechanism. Atomic transactions can also be provided.
- Security: Users should be confident of the privacy of their data.
- Heterogeneity: There should be easy access to shared data on diverse platforms (e.g. Unix workstation, Wintel platform etc).
REFERENCES
ALMES, G. T., BLACK, A. P., LAZOWSKA, E. D., AND NOE, J. D. 1983. The Eden system: A technical review. IEEE Trans. Softw. Eng. 11, 1 (Jan.), 43-59.
BARAK, A., MALKI, D., AND WHEELER, R. 1986. AFS, BFS, CFS . or Distributed File Systems for UNIX. In European UNIX Users Group Conference Proceedings (Sept. 22-24, Manchester, U.K.). EUUG, pp. 461-472.
BARAK, A., AND PARADISE, 0. G. 1986. MOS: Scaling up UNIX. In Proceedings of USENIX 1986 Summer Conference. USENIX Association, Berkeley, California, pp. 414-418.
FLOYD, R. 1989. Transparency in distributed file systems. Tech. Rep. 272, Department of Computer Science, University of Rochester.
HOWARD, J. H., KAZAR, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., AND SIDEBOTHAM, R. N. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 55-81.
RITCHIE, D. M., AND THOMPSON, K. 1974. The UNIX time sharing system. Commun. ACM 19, 7 (Jul.), 365-375.
TANENBAUM, A. S., AND VAN RENESSE, R. 1985. Distributed operating systems. ACM Comput. Suru. 17, 4 (Dec.) 419-470.