- NFS on the Internet
- Public Filehandle
- Multi-component Lookup
- NFS URL
- NFS and FTP
- NFS Performance
- Proxy and Caching
- NFS for Java
The Filesystem for the InternetApril 1997
NFS was designed as a file access protocol. NFS clients can "mount" a server filesystem and make it appear as if it were a local disk. The high speed of Ethernet provides performance that sometimes exceeds that of the local disk. Users no longer need to transfer entire files across the network, NFS moves file data automatically as needed.
The NFS protocol was designed to be lean and efficient on the relatively low-powered machines of the time. The stateless server permitted rapid and reliable recovery after a server crash with no harm to clients other than a brief interruption in service. The designers of NFS were forward thinking in their consideration of other kinds of client platforms, as well as providing access to a variety of filesystem types on a variety of server platforms. This additional consideration has been justified by the implementation of NFS on all popular client and server platforms. NFS is a standard, both de facto and de jure - it is defined in RFCs 1094 (version 2) and 1813 (version 3) and in two X/Open specifications [X/Open90, X/Open91].
NFS has been outstandingly successful as a protocol for providing distributed access to files. In 1994 Dataquest reported 8.5 million computers using NFS for file access and estimates growth to 12 million computers in 1997.
This paper describes WebNFSTM, a feature that eliminates the overheads in connecting to an NFS server. WebNFS makes information on NFS servers (versions 2 and 3) available to web browsers and JavaTM applets. WebNFS also makes it easy to access Internet NFS servers through corporate firewalls.
First, I will provide some background that explains the issues that prompted WebNFS, explain the server changes, compare NFS with file transfer protocols (HTTP and FTP), and then describe the opportunity that WebNFS provides for Web browsers and Java programs.
2.1 UDP vs TCPNFS implementations most commonly use UDP as a transport. UDP was preferred initially because it performed well on local area networks and was faster than TCP. While UDP benefited from the high bandwidth and low latency typical of LANs, it performed poorly when subjected to the low bandwidth and high latency of WANs like the Internet. Early experiences of NFS on the Internet were that the protocol was a bandwidth hog that used the network inefficiently.
In recent years, improvements in hardware and TCP implementations have narrowed this advantage enough that TCP implementations can outperform UDP and a growing number of NFS implementations now support TCP. SolarisTM 2.5 NFS uses TCP in preference to UDP.
2.2 NFS Version 3Version 3 of the NFS protocol was designed as a cooperative effort by a number of NFS vendors including Sun, IBM, HP, DEC, and Data General. The protocol revision succeeded in overcoming a number of limitations in version 2 of the protocol.
2.2.1 File OffsetsVersion 2 of the NFS protocol limited file offsets to a 32 bit quantity which limited the size of files accessible by clients to 4.2 Gb. For users who regularly access larger files, this was a severe limitation. NFS version 3 extended the file offsets and a number of other fields to 64 bits.
2.2.2 Transfer SizeNFS Version 2 limited the data transfer size to 8 kb. No single read or write request could exceed 8kb. This limits performance on high bandwidth networks since it artificially increases the number of NFS requests to transfer a given amount of data. NFS version 3 removed that limitation and allows the client and server to negotiate a maximum transfer size. The effect of larger transfer sizes is to reduce the number of NFS requests required to move a given quantity of data which provides better use of network bandwidth and I/O resources on clients and servers. If the server supports it, a client can issue a read request that downloads a file in a single operation. Larger transfer sizes are particularly important on the Internet where high latency penalizes protocols with large numbers of turnarounds.
2.2.3 Write PerformanceVersion 2 NFS servers must commit data written by a client to stable storage (a disk or NVRAM) before responding affirmatively to the client. This avoids the potential for lost data if the server should crash. Since this requirement harms the ability of the server to buffer I/O efficiently through memory, it imposed a performance penalty on client writes. While the use of NVRAM [Moran90] [Presto93] on the server can provide some write buffering, it imposes an expensive configuration requirement on servers.
NFS version 3 provides a new COMMIT operation that allows a client to perform "unstable" writes to a server followed by a COMMIT request. The server is required to verify that client data is on stable storage only when it receives the COMMIT operation. A mechanism is provided that allows the client to detect server loss of uncommitted data and recover. Measurements of NFS V3 implementations [Pawlowski94] show that writes are now much faster and limited now only by the speed of the network.
2.3.1 TCP vs UDPThe firewalls that protect corporate intranets from Internet attackers have also been an effective barrier to the use of NFS on the Internet. Packet filtering firewalls are relatively easy to configure for protocols that use TCP on well-known ports. UDP based protocols are perceived as insecure because of their vulnerability to replay attacks.
2.3.2 Port numberWhile NFS can be run over a TCP connection, its use of a well-known port is only de facto. Almost all NFS implementations use UDP or TCP port 2049. As an RPC-based protocol, the NFS port is supposed to be determined via a portmap service that registers on well-known port 111 and maps RPC program numbers to dynamically assigned ports (See FIGURE 1). To communicate with an RPC service, a client is first required to submit the program number of the service to the server's portmapper and receive the assigned port for the service. Although some sophisticated firewalls can track these port negotiations, it is not a common feature. Both packet filter firewalls and application-level proxies, like SOCKS, prefer TCP streams to well-known ports.
Since all NFS implementations use port 2049, it would seem easy to permit firewall passage of NFS traffic by having clients skip the portmapper protocol and communicate directly with the server on port 2049. However, the client cannot communicate at all with an NFS server unless it has first obtained an initial filehandle using another RPC service: the MOUNT protocol.
2.4 MOUNT ProtocolAn NFS filehandle is a value generated by the server that is used by the client to uniquely identify a file or directory on the server. NFS version 2 filehandles are a 32 byte quantity, version 3 filehandles are variable length up to a maximum of 64 bytes.
FIGURE 2. Mount dialog
2.5 SummaryNFS is not seen on the Internet for the following reasons:
FIGURE 3. Public Filehandles
FIGURE 4. Access with Public Filehandle
4.1 Pathname Evaluation and LatencyWANs are noted for high latency. There is a minimum 30ms latency imposed by the speed of light just to cross the United States. Latency becomes a significant problem across satellite links. Protocols that require a high number of turnarounds (a request message followed by a reply) between client and server, are particularly sensitive to latency effects.
4.1.1 Initialization OverheadThe NFS protocol was designed for LANs where latency is low and turnarounds are not a significant problem. A single read request requires a minimum of 5 turnarounds (portmap, mount, portmap, lookup, read). The public filehandle helps greatly in eliminating all of this overhead - the very first message sent to the server can be an NFS operation, e.g., LOOKUP.
4.1.2 Pathname EvaluationThe process of evaluating a pathname is expensive in turnarounds. The NFS protocol permits the evaluation of only a single pathname component at a time. A pathname evaluation requires a LOOKUP request to be transmitted to the server for every component in the pathname, so it can be quite expensive to locate the filehandle for a file located several directory levels away from the public filehandle. This latency can become intolerable if the server is across the world. For instance, a client in California communicating with a server in Moscow incurs a 100ms speed of light delay for one turnaround. Evaluation of a pathname with 10 components would take 1 second - minimum.
FIGURE 5. Pathname and Component Names
4.2 Server Evaluation of PathnameA WebNFS server can evaluate an entire pathname in place of a component name when the LOOKUP is relative to the public filehandle. The NFS version 2 protocol limits this pathname to 255 characters, whereas UNIX pathnames can be as long as 1023 characters, however pathnames longer than 255 characters are extremely rare. NFS version 3 does not limit the length of component names. This process of evaluating a pathname on the server is termed "Multi-Component Lookup" (MCL).
To illustrate, here is a comparison of pathname evaluation with and without MCL:
FIGURE 6. Multi-component Lookup Comparison
A path that evaluates to a file or directory outside the exported filesystem may result in a "not found" error from some servers, though this policy is completely up to the server implementation. Solaris 2.6 WebNFS servers will evaluate symbolic links that cross into other exported filesystems though the client's ability to use the resulting filehandle will depend on the access control for that export.
A path that names a symbolic link will retrieve the filehandle for the symbolic link. It is up to the client to decide how to interpret the contents of the link. It is possible that the link may contain a URL. The client can determine this by checking for the character sequence "://". If the link is a URL, then the client can evaluate the URL, which may not necessarily be an NFS URL.
If the link does not appear to be a URL, then the client must have the server evaluate it relative to the public filehandle: if the link text begins with a slash, then it is an absolute symbolic link and should be evaluated on its own. If it doesn't begin with a slash, then the link can be assumed to be relative and should be appended to the pathname for the symbolic link before evaluation relative to the public filehandle.
FIGURE 7. Symbolic Link Text
4.4 Absolute vs Relative PathsThe pathname taken from an NFS URL should be evaluated relative to the public filehandle, i.e., the path should not begin with a slash unless it is being evaluated as an absolute symbolic link. A path that begins with a slash will be evaluated relative to the server's root, rather than the public filehandle directory.
5.1 URL StructureWebNFS clients can be World Wide Web browsers that browse documents on WebNFS servers using the NFS protocol, either version 2 or version 3. These documents are accessed via the NFS URL:
If the :port is omitted, it defaults to 2049. The path is evaluated by the server using multi-component lookup relative to the public filehandle (See FIGURE 8). Note that the path in the NFS LOOKUP request should not begin with a slash unless one is specifically prepended, e.g., nfs://server//path.
5.2 Fetching a FileWhile HTTP clients obtain documents with a GET request, NFS clients must issue a LOOKUP request to obtain a filehandle for the document, followed by a READ call to download the data. The reply to the LOOKUP request includes the filehandle along with the file attributes. The client must check the file size in the attributes to decide whether to fetch the file with a single large READ request, or if limited by the server's transfer size, by multiple smaller READ requests.
5.3 Fetching a DirectoryIf the attributes returned by the LOOKUP request indicate that the requested object is a directory, then the browser must issue either a READDIR request (version 2) or a READDIRPLUS request (version 3). The READDIRPLUS request has the advantage that it returns filehandle and file attribute information along with the names of the directory entries. Version 2 clients must issue individual LOOKUP requests for every directory entry to retrieve this information. For large directories, the saving in network calls by READDIRPLUS is significant.
5.4 Fetching a Symbolic LinkIf the attributes returned by a LOOKUP indicate that the requested object is a symbolic link, the browser should retrieve the link text with a READLINK request. It should then interpret the link text as described in section 4.3.
5.5 Document TypesDocuments retrieved by NFS do not arrive with MIME headers, so the client must identify document types using file suffixes or other means. This is commonly done with reference to a ".mime.types" file that maps file suffixes to MIME types.
5.6 Preferred Binding ProcedureA WebNFS client would ideally find that it can make a TCP connection to the server and have public filehandle support with NFS version 3, however there are still many NFS servers out there that support none of these. The client must be prepared to handle:
5.6.1 MOUNT ProtocolIf the server does not support public filehandles, the MOUNT protocol must be used to convert the path in the URL to a filehandle.
If the browser uses the MOUNT protocol, it should be careful to set the RPC authentication credentials to AUTH_UNIX since some servers may otherwise reject requests.
6.1 Connection OverheadAn FTP client must first establish a TCP connection with the server. This connection is just a control connection that is not used to transfer data. The client transmits commands to the server and receives replies that indicate success of failure. The file data is transferred over another TCP connection that is established anew for each file to be transferred, then closed to indicate the end of the file. So to transfer n files requires n + 1 TCP connections. The data connections can be problematic for clients accessing servers through a packet filter firewall. While the control connection is an outgoing connection (initiated by the client), the data connections are incoming (initiated by the server) and are not directed to a specific port on the client. The firewall must include special code to track the state of the FTP connections and allow the incoming connections selectively or clients must be modified to support PASV connections [Bellovin94].
TCP connections for NFS are much simpler: the client establishes a connection to the server which is maintained for all file transfers until the client elects to break the connection. The connection is shared for both control and data.
6.2 ReliabilityFTP is often used to download large data archives over slow links. These downloads can take many minutes, even hours. The client transmits a GET request to the server which initiates a file download on a separate connection. If the connection is broken because of network or server outage, the download cannot easily be resumed. It must often be restarted from the beginning. Though the FTP protocol describes a restart procedure (in block and compressed modes) that could be used to support restart, its implementation is awkward, and correspondingly rare.
NFS clients download files with a series of READ requests. Each READ request contains a file offset and a count of the number of bytes to be read. For long, sequential reads typical of file transfer, a client will often "read-ahead", i.e., have concurrent READ requests each for a different chunk of the file. If the connection is broken, the client persists in attempts to create a new connection, and when successful, it continues to transmit READ requests from the last successful READ before the break. The user of an NFS client can rest easy in the knowledge that once a download is started, the client will persist until the transfer is complete.
6.3 Server ScalabilityThe FTP protocol is served by a user-level daemon. Each client control connection is serviced by a process on the server. A server with hundreds or thousands of active TCP connections is required to support the same number of active processes all moving data between the disks and the network. While this data movement on the server might be improved by file memory mapping and zero-copy TCP support, the overhead in maintaining a process context will always be greater than that of a protocol implemented entirely within the operating system like NFS.
Protocols like HTTP and FTP that are provided by user-level daemons are hobbled by the limitations of their interface to the underlying operating system. The use of worker processes at user level, or even worker threads within a single process cannot match the tight integration and low overheads achievable within the operating system.
NFS servers are implemented within the server operating system and benefit the server with low utilization of server resources which is why NFS servers handle much heavier client loads than an HTTP or FTP servers running on the same hardware, and yet provide better response times.
8.1 AuthenticationNFS uses the authentication flavors supported by the underlying Remote Procedure Call layer. Sun RPC supports multiple authentication "flavors" or techniques. This is an open-ended design that can accept newer authentication flavors as they are developed. Currently the following flavors are supported:
8.2 Link Level Authentication and EncryptionSince NFS runs over TCP connections it is a candidate for layering over link level techniques for securing data over TCP streams. NFS is sure to support some of the emerging standards in this arena: Netscape's Secure Sockets Layer (SSL) [Freier96], and SKIP [Aziz93].
8.3 AuthorizationMost UNIX implementations of NFS use UNIX permission bits to control access to files and directories. These permissions bits2 control read, write, and execute or search permissions to the owner of the file, the owner's group, or all other users.
NFS servers generally support whatever file access control is supported on the server's operating system whether they be simple permission bits (read, write, search, execute) or Access Control Lists (ACLs)3.
Most NFS servers also feature access control based on the client's network address.
9.1 Server ImprovementsThe LADDIS benchmark [Keith93] was developed by SPEC to measure the performance of NFS servers against simulated client loads. It has become a coveted goal amongst NFS server vendors to publish industry-leading LADDIS results � and to that end they aggressively tune their NFS implementations and hardware for peak performance. The processing power, memory and disk resources available to NFS servers is increasing exponentially.
While NFS has been criticized for poor write performance4, this is no longer an issue. Non-volatile memory [Moran90] has been used to cache NFS writes at memory speeds and innovative filesystems developed specifically for NFS like WAFL [Hitz94] have almost eliminated I/O delays due to disk rotational delay and seek time. The NFS version 3 protocol side-steps the synchronous write requirement of version 2 by allowing the server to cache client writes in memory until the client requests the data be flushed to stable storage with a COMMIT request.
Digital Equipment Corporation [Pawlowski94] have demonstrated that an implementation of NFS V3 running on a two Model 3000/600 workstations delivered peak write performance of 2,323 KB/s vs 320 KB/s for version 2 � more than 7 times faster. With support from Prestoserve NVRAM and write gathering, V3 was still ahead with 6,425 KB/s vs 5,022 KB/s for V2.
A Sun UltraSPARCTM Enterprise 6000 server recently leapfrogged Silicon Graphics previous SPEC [SPEC96] LADDIS throughput record by posting more than 21,000 NFSops/sec, almost double the previous record value.
As yet there is no support for a pure NFS proxy or proxy/cache mechanism in the NFS protocol. Research indicates that cache hit rates for proxy caches are not particularly high, so the benefit of a pure proxy cache for NFS is questionable.
11.1 File ClassThe Java class library provides two packages for access to data: the "io" package and the "net" package. The File class in the io package provides access to files on a local disk. It has methods that support general file access functions including random access to file data, directory creation and listing of directory contents.
11.2 URL ClassIf file data are on another server, then the URL class in the "net" package must be used. Although the URL class can handle different URL schemes (http, ftp, nntp, etc) it is limited by the protocols used to implement these schemes. For instance, none of these URL schemes can be used for random access to data within a file, to reliably return file creation data, last modify time, or file size. Again, because of the scheme limitations, the URL class cannot provide a method for listing directory contents since no URL scheme (other than NFS) provides a directory listing interface.
For security reasons, the File class is off-limits to most applets. Their only access to file data is through the URL class which imposes its own limitations. While it would be trivial to add NFS URL's as yet another URL scheme, its utility would be limited by the URL class. The NFS protocol was designed to make remote filesystems appear as local files to programs.
11.3 File Access and "Pure" JavaIn December last year, Javasoft, along with 100 other companies announced a "100% Pure Java" initiative to encourage the development of Java applications that do not depend on platform-specific features. For instance, an application that makes use of native classes cannot run on Java platforms that lack those native classes. Customers who buy Java software should have a reasonable expectation that a product labelled "100% Pure Java" will run on any platform that supports a Java virtual machine.
However, a Java application that needs access to files cannot be platform-independent because the file naming is platform-specific. For instance, to open a file on a Unix platform I might use:
f = new File("/net/myserver/file.txt")but on a Windows 95 platform I would need:
f = new File("\\myserver\file.txt")Whether the name is hard-coded in the application, obtained from a file, or typed in by the user, the application and/or the way it is used is platform-dependent.
11.4 File Naming with URLA solution to this file naming problem can be found in the URL [RFC 1738], [RFC 1808]. A URL has the same syntax on any platform. Additionally, most URLs specify a global name, e.g. an HTTP URL uniquely identifies a web page on a server anywhere in the world. In addition, a URL describes the network protocol to be used in the scheme name ahead of the first colon. We are developing an API for java.io that will allow a URL to be used to open a file. The scheme name in the URL will name the file access protocol to be used, e.g. "file" for local files, "nfs" for files on an NFS server, and so on. Behind this API will be the implementation for each file access protocol. The application need not have any file access protocol dependencies.
So to open a file on an NFS server a Java application might use
f = new File(new URL( "nfs://myserver/file.txt))and be assured that the file will be opened whether the file is on the same platform as the application, or on the other side of the Internet, regardless of the filesystem or operating system that hosts the application or the file.
11.5 Thin ClientsThin clients will present a low-cost solution to information access at desktops. Since the these devices will have no local storage, or cache-only storage, users will be dependent on remote servers for maintaining preferences, and other personal files. NFS servers already provide a low-cost, high performance, scalable file service.
[Bellovin94] Bellovin, S., "Firewall-Friendly FTP", RFC 1579, February 1994.
[CacheFS] Cache File System White Paper, Sun Microsystems, Inc.
[Eisler] M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification", IETF Internet-Draft: draft-ietf-oncrpc-rpcsec_gss-0
[Freier96] Alan O. Freier, Philip Karlton, Paul C. Kocher, "The SSL Protocol, Version 3.0", March 1996.
[Hitz94] D. Hitz, J. Lau, M. Malcolm, "File System Design for an NFS File Server Appliance," Winter USENIX Conference Proceedings, USENIX Association, Berkeley, CA, January 1994.
[Keith93] Bruce E. Keith and Mark Wittle. "LADDIS: The Next Generation in NFS File Server Benchmarking." Summer USENIX Conference Proceedings. USENIX Association, Berkeley, CA, June 1993.
[Moran90] J. Moran, R. Sandberg, D. Coleman, J. Kepecs, B. Lyon, "Breaking Through the NFS Performance Barrier," Proceedings of the 1990 Spring UNIX Users Group, Munich, Germany, pages 199-206, April 1990.
[Pawlowski94] Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel, David Hitz, "NFS Version 3 Design and Implementation," Summer USENIX Conference Proceedings, USENIX Association, Berkeley, CA, June 1994.
[Postel85] Postel, J., and J. Reynolds, "File Transfer Protocol", STD 1, RFC 959, USC/Information Sciences Institute, October 1985.
[RFC 1738] Berners-Lee, T., Masinter, L., McCahill, M., "Uniform Resource Locators (URL)." http://www.internic.net/rfc/rfc1738.txt
[RFC 1808] Fielding, R., "Relative Uniform Resource Locators." http://www.internic.net/rfc/rfc1808.txt
[RFC 2054] Callaghan, B., "WebNFS Client Specification," http://www.internic.net/rfc/rfc2054.txt
[RFC 2055] Callaghan, B., "WebNFS Server Specification," http://www.internic.net/rfc/rfc2055.txt
[Sandberg85] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, B. Lyon, "Design and Implementation of the Sun Network Filesystem," USENIX Conference Proceedings, USENIX Association, Berkeley, CA. Summer 1985.
[SPEC96] The Standard Performance Evaluation Corporation, SFS LADDIS results.
[X/Open90] X/Open Company. Developers' Specification: Protocols for X/Open PC Interworking: (PC)NFS. X/Open Company Limited, Reading, England, 1990.
[X/Open91] X/Open Company. CAE Specification: Protocols for X/Open Interworking: XNFS. X/Open Company Limited, Reading, England, 1991.
Written by Brent Callaghan
Questions or comments regarding this service? firstname.lastname@example.org