9. NFILE RESYNCHRONIZATION PROCEDURE Ordinarily, the user side sends NFILE commands to the server side over the control connection; the server side responds to every user command, and file data is transmitted over the data channels. This section describes a resynchronization procedure that takes place when something disturbs the usual course of events. First, if the server side aborts while sending or receiving data, nothing can be done to salvage the connection between the two hosts. The control connection and any data channels associated with this connection are broken. This happens rarely, if at all. It is not unusual for the user side to abort file operations, either commands or data transfer. On a Symbolics computer, the user can do this by pressing CONTROL-ABORT. An important aspect of any file protocol is the way it handles the situation when the user side aborts file operations. An NFILE user side reacts to user side aborts by immediately marking the connection unsafe. When a control connection is unsafe, it must be resynchronized before it can be used again. Data channels can also be marked unsafe, and must also be resynchronized before further use. The resynchronization process rids the connection (whether control or data connection) of bytes of data that are now unwanted, and thus cleans up the channel so it can be used again. The resynchronization procedure is somewhat complex, but it fulfills a genuine need. For those interested, a brief design discussion is included as note <3>.
9.1 NFILE Control Connection Resynchronization NFILE requires any unsafe control connection to undergo a resynchronization procedure before further use. Therefore, the resynchronization does not necessarily occur immediately after the control connection is marked unsafe. The user side initiates the control connection resynchronization when another operation on the control connection is attempted. A "mark" is defined in the context of Byte Stream with Mark: See the section "Discussion of Byte Stream with Mark", section 12.1. USER SIDE STEPS: CONTROL CONNECTION RESYNCHRONIZATION 1. The user side sends a mark over the control connection to the server. 2. The user side sends the ASCII characters USER-RESYNC-DUMMY (as a data token) to the server. 3. The user side sends a second mark to the server. 4. The user side declares the control connection safe (at the token list level). 5. The user side generates and sends a unique data token to the server. 6. The user side then waits, expecting to detect a mark followed by the unique data token. The user side reads and discards all tokens and marks until the desired match is found. Once the user side detects the mark and unique data token, the control connection has been fully resynchronized, and can be used again. SERVER SIDE STEPS: CONTROL CONNECTION RESYNCHRONIZATION 1. The server side detects a mark. The server is thus alerted that the control connection is unsafe, and that resynchronization is in progress. 2. The server continues to read data coming from the user side until it detects the second mark, and the token following it.
3. The server checks to see if the token following the mark is USER-RESYNC-DUMMY. This rare situation occurs if the user aborts during the course of the resynchronization itself. If so, the server side discards the USER-RESYNC-DUMMY token. The control connection is still unsafe, and the user side restarts the resynchronization procedure; the server side therefore begins at Step 2 again. 4. If the token following the mark is not USER-RESYNC-DUMMY (this is the expected circumstance), the server should have received a single data token that is the unique data token generated by the user side. a. The server sends a mark to the user side. b. The server declares the control connection safe (at the token list level). c. The server sends the unique data token to the user side. 5. If the server detects something following the mark that was neither USER-RESYNC-DUMMY nor a single data token, a protocol error has occurred. 9.2 NFILE Data Connection Resynchronization The NFILE data channel resynchronization procedure is similar to the NFILE control connection resynchronization. Both procedures are based on a mark signalling the unsafe condition, then a second mark followed by a unique identifier. One important difference between the two procedures is the circumstances in which they occur. Control connections are put into unsafe states only when the user aborts during control connection I/O operations. Data channels are made unsafe by a larger set of circumstances:
- User aborts occur during the file protocol operations that assign and deassign data channels. This is the most common cause of data channels becoming unsafe. - A server receives a CLOSE command (with abort-p supplied as Boolean truth) specifying an open file that has not finished transmitting data. That is, file reading is aborted. - The ABORT command is issued, causing data channels to be made unsafe. - The FILEPOS command is issued, causing the input data channel to become unsafe. The resynchronization clears the data channel of unwanted data from aborted operations and puts the data channel in a known state. The data channel resynchronization procedure is invoked when the user side gives the RESYNCHRONIZE-DATA-CHANNEL command over the control connection. The following policies can be used to improve response time, but are not required by the NFILE protocol: The user side can initiate resynchronization only if it needs the data channel, having first tried to use a free data channel that does not require resynchronization. Also, the user side can periodically resynchronize all unsafe data channels. In giving the RESYNCHRONIZE-DATA-CHANNEL command, the user side indicates which data channel should be resynchronized. Data channels are unidirectional, which means that depending on the direction (either input or output) of the data channel, either the user side or the server side sends the resynchronization data. This is another difference from the resynchronization of the control connection, in which the resynchronization data is always sent by the user side. The resynchronization steps for input data channels are different than the steps for output data channels.
INPUT DATA CHANNEL RESYNCHRONIZATION 1. The user side gives the RESYNCHRONIZE-DATA-CHANNEL command on the control connection, with only one argument, the handle of the data channel to be resynchronized. 2. The server side of the data channel generates a unique identifier, and sends that data token in its regular command response to the user side. 3. The server side sends a mark over the data channel. 4. The server side sends the unique identifier token over the data channel. 5. The user side reads until it detects a mark followed by the unique identifier token. The resynchronization is then complete. The data channel is no longer in an unsafe state. OUTPUT DATA CHANNEL RESYNCHRONIZATION 1. The user side gives the RESYNCHRONIZE-DATA-CHANNEL command on the control connection, with two arguments: the handle of the data channel to be resynchronized, and a unique identifier that it has just generated. 2. The user side of the data channel sends a mark. 3. The user side of the data channel sends a dummy identifier token. The dummy identifier can be any token that the server could not interpret as being the unique identifier. One suggestion is the data token DUMMY-IDENTIFIER. 4. The server side of the data channel was alerted by the RESYNCHRONIZE-DATA-CHANNEL command that resynchronization is in progress. The server side now reads the data, seeking the first mark. 5. The server side reads and discards the first mark and the dummy identifier. 6. The user side sends a second mark. 7. The user side sends the unique identifier. 8. The server side recognizes the mark and the unique identifier that follows, and the resynchronization is
complete. The data channel is no longer in the unsafe state. 10. NFILE ERRORS AND NOTIFICATIONS NFILE recognizes two types of errors: command response errors and asynchronous errors. In addition to errors, NFILE supports notifications. Command response errors: - Signify an error that prevented the successful completion of the command; when such an error occurs, a command response error is sent instead of a normal command response. - Occur frequently in normal operations Asynchronous errors: - Are not related to any specific command - Are associated with an erring data channel - Typically indicate a problem in the transfer, such as running out of disk space or allocation, or an unreadable disk record - Occur rarely in normal operations Notifications: - Are not associated with an error - Are sent at the server's discretion - Provide general information, such as a warning that the system is going down 10.1 Notifications From the NFILE Server The NFILE server can send asynchronous notifications to the user side over the control connection. The text of the notification contains information of interest to the person using NFILE, such as a warning that the server's operating system will be going down soon. Notifications can come from the server side at any time that the server is not sending something else. The format of NFILE notifications is: (NOTIFICATION "" text) The empty string "" takes the place of a transaction identifier. Notifications are initiated by the server, and are not associated with any transaction originated by the user side.n
10.2 NFILE Command Response Errors When an error prevents the successful completion of an NFILE command, a command response error is sent instead of the normal command response. A normal command response indicates success; a command response error indicates failure of the command. NFILE command response errors are sent from the server to the user across the control connection as top-level token lists, in this format: (ERROR tid three-letter-code error-vars message) ERROR is a keyword. The tid is the transaction identifier of the command that encountered this error. The arguments three-letter- code, error-vars, and message are all required. The three-letter-code provides the information on what kind of an error was encountered. For a table of the three-letter codes and their meanings: See the section "NFILE Three-letter Error Codes", section 10.4. message is a string that is displayed to the human user of the protocol. error-vars is a keyword/value list. The three possible keywords are: PATHNAME, OPERATION, and NEW-PATHNAME. Before transmitting an error, the server looks at the type of error to see if it can easily determine the value of any of the keywords. If so, the server includes the keyword/value pair in its error. If not, the keyword/value pair is omitted. The value associated with OPERATION is the keyword naming the NFILE command that failed. The values associated with PATHNAME and NEW-PATHNAME are strings in the full pathname syntax of the server host. For example, suppose the server on a file system with hierarchical directories could not access a file because its containing directory did not exist. The command error response would use the PATHNAME keyword to indicate the first directory level that did not exist, instead of the full pathname which was supplied as the command argument. This gives the user side valuable information that it otherwise would not have known.
10.3 NFILE Asynchronous Errors When a data channel process, in either direction, encounters an error condition, the server sends an asynchronous error description. An asynchronous error description consists of a top-level token list. Typically, asynchronous errors indicate error conditions in the transfer, such as running out of disk space or allocation, or a unreadable disk record. The format of asynchronous error descriptions is: (ASYNC-ERROR handle three-letter-code error-vars message) ASYNC-ERROR is a keyword. The handle argument identifies the erring data channel. The arguments three-letter-code, error-vars, and message are all required. Their meanings are the same as in NFILE command error responses: See the section "NFILE Command Response Errors", section 10.2. When the server detects an asynchronous error on an input data channel, the server sends an asynchronous error description on that data channel itself. When an asynchronous error occurs on an output data channel, the asynchronous error description is sent on the control connection. Some asynchronous errors are restartable. In this context, restartable means it makes sense to try to resume the operation. One example of a restartable error is an attempt to write a file to a file system that is out of room. The server side indicates whether an asynchronous error is restartable by prepending the keyword RESTARTABLE and the associated value Boolean truth to the error-vars list. To proceed from a restartable error, the user side sends a CONTINUE command over the control connection. On any asynchronous error, either input or output, the data channel on the server side enters an "asynchronous error outstanding" state. The server can exit that state in one of two ways: by receiving a CONTINUE command or a CLOSE command with the abort-p argument supplied as Boolean truth. On a normal CLOSE (not a close-abort), the server side checks the channel it was requested to close. If an asynchronous error description has been sent on the data channel, but not yet processed by CONTINUE, the server side does not close the channel, but sends a command error response. The same thing happens on a FINISH command received on a channel that has an asynchronous error pending. In both cases, the three-letter code included in the command error response is EPC, for Error Pending on Channel.
10.4 NFILE Three-letter Error Codes Usually the server's operating system provides some description of an error that occurs. NFILE has a mechanism for conveying that information to the user side. Upon detecting an error, the NFILE server should characterize the error by choosing the three-letter code that best describes the error. The three-letter code is an argument in both the command response error and asynchronous error messages from the server to the user. Each of the NFILE three-letter codes represents some system error. The set of codes enables all operating systems to use one error- reporting mechanism. Some operating systems will never encounter certain of the error conditions. Some errors fit logically into two error codes. For example, suppose the server could not delete a file because the file was not found. This error could be considered either CDF (Cannot Delete File) or FNF (File Not Found). In this case, File Not Found gives more specific and valuable information than Cannot Delete File. Since the protocol does not allow more than one error code to be reported when an error occurs, the server must choose the most appropriate error code, given the information available to it from the operating system. This is the set of three-letter codes: ACC Access error. This indicates a protection-violation error. ATD Incorrect access to directory. A directory could not be accessed because the user's access rights to it did not permit this type of access. ATF Incorrect access to file. A file could not be accessed because the user's access rights to it did not permit this type of access. BUG File system bug. This includes all protocol violations detected by the server, as well as by the host file system. CCD Cannot create directory. An error occurred in attempting to create a directory. CDF Cannot delete file. The file system reported that it cannot delete a file. CCL Cannot create link. An error occurred in attempting to create a link.
CIR Circular link. An operation was attempted on a pathname that designates a link that eventually links back to itself. CRF Cannot rename file. An error occurred in attempting to rename a file. CSP Cannot set property. An error occurred in attempting to change the properties of a file. This could mean that you tried to set a property that only the file system is allowed to set, or a property that is not defined on this type of file system. DAE Directory already exists. A directory could not be created because a directory or file of this name already exists. DAT Data error. The file system contains unreadable data. This could mean data errors detected by hardware or inconsistent data inside the file system. DEV Device not found. The device of the file was not found or does not exist. DND "Do Not Delete" flag set. An attempt was made to delete a file that is marked by a "Do Not Delete" flag. DNE Directory not empty. An invalid deletion of a nonempty directory was attempted. DNF Directory not found. The directory was not found or does not exist. This refers specifically to the containing directory; if you are trying to access a directory, and the actual directory you are trying to access is not found, FNF (for File Not Found) should be indicated instead. EPC Error pending on channel. The server cannot close the channel in attempting to close or finish the channel. FAE File already exists. The file could not be created because a file or directory of this name already exists. FNF File not found. The file was not found in the containing directory. The TOPS-20 and TENEX "no such file type" and "no such file version" errors should also report this condition. FOO File open for output. Opening a file that was already opened for output was attempted. FOR Filepos out of range. Setting the file pointer past the
end-of-file position or to a negative position was attempted. FTB File too big. File is larger than the maximum file size supported by the file system. HNA Host not available The file server or file system is intentionally denying service to user. This does not mean that the network connection failed; it means that the file system is explicitly not available. IBS Invalid byte size. The value of the "byte size" option was not valid. ICO Inconsistent options. Some of the options given in this operation are inconsistent with others. IOD Invalid operation for directory. The specified operation is invalid for directories, and the given pathname specifies a directory, in directory pathname as file format. IOL Invalid operation for link. The specified operation is invalid for links, and this pathname is the name of a link. IP? Invalid password. The specified password was invalid. IPS Invalid pathname syntax. This includes all invalid pathname syntax errors. IPV Invalid property value. The new value provided for the property is invalid. IWC Invalid wildcard. The pathname is not a valid wildcard pathname. LCK File locked. The file is locked. It cannot be accessed, possibly because it is in use by some other process. LIP Login problems. A problem was encountered while trying to log in to the file system. MSC Miscellaneous problems. NAV Not available. The file or device exists but is not available. Typically, the disk pack is not mounted on a drive, the drive is broken, or the like. Operator intervention is probably required to fix the problem, but retrying the operation is likely to succeed after the problem is solved.
NER Not enough resources. For example, a system limit on the number of open files or network connections has been reached. NET Network problem. The file server had some sort of trouble trying to create a new data connection, or perform some other network operation, and was unable to do so. NFS No file system. The file system was not available. For example, this host does not have any file systems, or this host's file system cannot be initialized or accessed for some reason, or the file system simply does not exist. NLI Not logged in. A file operation was attempted before logging in. Normally the file system interface always logs in before doing any operation, but this problem can occur in certain unusual cases in which logging in has been aborted. NMR No more room. The file system is out of room. This can mean any of several things: - The entire file system is full. - The particular volume involved is full. - The particular directory involved is full. - The user's allocated quota has been exceeded. RAD Rename across directories. The devices or directories of the initial and target pathnames are not the same, but on this file system they are required to be. REF Rename to existing file. The target name of a rename operation is the name of a file that already exists. UKC Unknown operation. An unsupported file system operation was attempted, or an unsupported command was attempted. UKP Unknown property. The property is unknown. UNK Unknown user. The specified user name is unknown to this host. UUO Unimplemented option. An option to a command is not implemented. WKF Wrong kind of file. This includes errors in which an invalid operation for a file, directory, or link was attempted. WNA Wildcard not allowed.
11. TOKEN LIST TRANSPORT LAYER PURPOSE: The Token List Transport Layer is a protocol that facilitates the transmission of simple structured data, such as lists. 11.1 Introduction to the Token List Transport Layer The Token List Transport Layer is a general-purpose protocol. The Token List Transport Layer sends "tokens" through its underlying stream. Each token usually represents a simple quantity, such as a string or integer. Tokens can be organized into "token lists". Special tokens are provided to denote the starting and ending point of lists. The token list transport layer differentiates between "top-level token lists", which are not contained in other lists, and "embedded token lists", which are contained in other lists. Using lists makes it convenient to send structured records, such as commands and command responses of the client protocol. The top-level token lists provide robustness. The Token List Transport Layer is a general term that includes two separate but related subjects: the "token list stream" and the "token list data stream". The token list stream is commonly used for applications that can easily organize the information to be transmitted into tokens and lists. The token list data stream is more appropriate for transmitting a large volume of data that cannot easily be structured into tokens and lists, such as file data, which is simply a sequence of characters or bytes. The following table illustrates the main differences between token list streams and token list data streams: Token List Data Stream Token List Stream ---------------------- ----------------- Built on: token list stream Byte Stream with Mark Transmits: stream data tokens, token lists Example of use: NFILE data channels NFILE control connection
NFILE uses the the Token List Transport Layer, and provides an excellent example of its usefulness. The NFILE commands and command responses are sent over the control connection in a token list stream. File data is sent across each data channel in a token list data stream. 11.2 Token List Stream 11.2.1 Types of Tokens and Token Lists All numbers in the token list documentation are represented in decimal notation. Bytes are 8 bits long. TYPES OF TOKENS Tokens are of the following types: 1. Atomic tokens. Atomic tokens are of the following subtypes: - Data tokens. A data token consists of a sequence of bytes with an effectively infinite maximum length. In some contexts a data token represents a string; in other contexts, a data token is other arbitrary data. Each data token is preceded in the token list stream by a representation of its length in bytes. Data tokens that are under 200 bytes long are preceded by one byte containing their length in bytes. That is, a data token of 34 bytes is preceded by one byte of value 34. Data tokens 200 bytes or over are preceded by the byte known as PUNCTUATION-LONG, of value 201. After the 201 comes a four-byte-long number (least significant byte first) containing the length of the data token that follows. - Numeric tokens. A sequence of bytes that represent and encode a nonnegative binary integer. The largest valid integer is 2^63 - 1. Numeric tokens are either short integers (less than 256) or long integers (greater than or equal to 256). Short integers are preceded by the byte known as PUNCTUATION-SHORT-INTEGER, of value 206.
Long integers are begun by PUNCTUATION-LONG-INTEGER, of value 207. One byte follows, containing the length (in bytes) of the long integer. The integer itself is next, least significant byte first. - Keyword tokens. A sequence of bytes that represent and encode a named identifier of the implemented protocol. Keyword tokens are used by the client protocol to convey a name; the only significance of a keyword token is in its name. Each keyword is preceded by the byte known as PUNCTUATION-KEYWORD, of value 208. The data token following PUNCTUATION-KEYWORD represents the name of the keyword as a string. The characters are in upper-case standard ASCII. - Boolean truth. A special token that represents the Boolean truth value. This token is known as BOOLEAN-TRUTH, of value 209 <4>. 2. Control tokens. The token list stream supports four control tokens to delimit token lists, and one padding token. TOP-LEVEL-LIST-BEGIN 202 This control token appears at the start of each top-level token list. TOP-LEVEL-LIST-END 203 This control token appears at the end of each top-level token list. LIST-BEGIN 204 This control token appears at the start of each embedded token list. LIST-END 205 This control token appears at the end of each embedded token list. PUNCTUATION-PAD 200 This padding token should be ignored by the token list stream. It can be sent to fill buffers.
TOKEN LISTS A token list consists of a sequence of atomic tokens or token lists. Token lists are begun and ended by control tokens that delimit the token lists. There are three types of token lists: 1. Top-level token lists. Top-level token lists begin with TOP-LEVEL-LIST-BEGIN and end with TOP-LEVEL-LIST-END. Top-level token lists are not contained in other lists. 2. Embedded token lists. These token lists occur inside other token lists. They begin with LIST-BEGIN and end with LIST-END. 3. The empty token list. This is a special example of the embedded token list. In some contexts, the empty token list represents Boolean falsity. An embedded empty token list is composed of a LIST-BEGIN followed immediately by a LIST-END. A top-level empty token list is composed of TOP-LEVEL-LIST-BEGIN followed immediately by TOP-LEVEL-LIST-END. 11.2.2 Token List Stream Example This section contains an example of some data that can appear on a token list stream. The example is a top-level token list encoding an NFILE DELETE command. The DELETE command is composed of the following pieces: a TOP- LEVEL-LIST-BEGIN, the keyword DELETE, a data token containing the transaction identifier, a LIST-BEGIN, a LIST-END, a data token containing a pathname of a file to be deleted, and a TOP-LEVEL-LIST- END. This example uses t105 as the transaction identifier, and /usr/max/temp as the pathname. All numbers in this section are expressed in decimal notation. The pieces of the command are displayed here in order: 1. TOP-LEVEL-LIST-BEGIN 2. The keyword token whose name is DELETE 3. The data token containing the characters: t105 4. LIST-BEGIN 5. LIST-END
6. The data token containing the characters: /usr/max/temp 7. TOP-LEVEL-LIST-END Now, let's translate each piece of the command into the bytes that are transmitted through the token list stream. 1. TOP-LEVEL-LIST-BEGIN 202 represents TOP-LEVEL-LIST-BEGIN 2. The keyword token whose name is DELETE. A keyword token is introduced by PUNCTUATION-KEYWORD, which is represented in the token list stream as the byte 208. A data token follows, containing the string "DELETE". A data token under 200 bytes long is introduced by one byte containing its length in bytes. The length of this data token is 6 bytes. The data token continues with the standard ASCII character set representation of each character in the string DELETE: 208 represents PUNCTUATION-KEYWORD 006 represents the length of this data token 068 represents "D" 069 represents "E" 076 represents "L" 069 represents "E" 084 represents "T" 069 represents "E" 3. The data token containing the characters: t105 This data token is begun by its length in bytes (4), and continues with the NFILE character set representation of each character in the string: 004 represents the length of this data token 116 represents "t" 049 represents "1" 048 represents "0" 053 represents "5" 4. LIST-BEGIN 204 represents LIST-BEGIN
5. LIST-END 205 represents LIST-END 6. The data token containing the characters: /usr/max/temp 013 represents length of this data token 047 represents "/" 117 represents "u" 115 represents "s" 114 represents "r" 047 represents "/" 109 represents "m" 097 represents "a" 120 represents "x" 047 represents "/" 116 represents "t" 101 represents "e" 109 represents "m" 112 represents "p" 7. TOP-LEVEL-LIST-END 203 represents TOP-LEVEL-LIST-END 11.2.3 Mapping of Lisp Objects to Token List Stream Representation The Symbolics interface to the token list stream sends Lisp objects through the underlying Byte Stream with Mark and produces Lisp objects on the other end. Not all Lisp objects can be sent in this way. For example, compound objects other than lists are not handled. An appropriate analogy is the sending and reconstruction of list structure via printed representation. These are the types of objects that can be sent, and their representations: - Lisp strings are represented as data tokens in the NFILE character set. Only 8-bit strings can be sent <5>. - Keyword symbols are represented as keyword tokens. Although identifiable and reconstructable as keyword symbols, only their names are sent. Any properties, bindings, and the like are not sent. - T is represented as BOOLEAN-TRUTH. - NIL is represented as the empty token list. - Lists are represented as token lists. Circular lists cannot
be sent. See the footnote related to the ambiguity between NIL and the empty list: See the section "Types of Tokens and Token Lists", section 11.2.1. - Integers are represented as numeric tokens. Only nonnegative integers less than 2^63 can be sent. 11.2.4 Aborting and the Token List Stream A token list stream accrues the benefits of the abort management policy of the Byte Stream with Mark on which it is built. In order to fully realize this benefit, some simple rules must be obeyed by any implementation of the token list stream. The term "transmission" means either an atomic token or a complete top-level token list. A transmission starts with the control token TOP-LEVEL-BEGIN and ends with TOP-LEVEL-END. The top-level token list can contain embedded token lists. The interface that writes to the token list stream must be capable of writing the representation of entire transmissions. When this interface is called, it must effectively lock the token list stream, and exclude access by other processes until the entire transmission has been encoded and sent. If the sending is aborted while the stream is locked, the stream enters an "unsafe" state. Trying to send data while the stream is unsafe signals an error. The application and the token list stream must send a mark to cause resynchronization, and allow the token list stream to be used again. When the reading side encounters this mark, it resynchronizes itself according to whatever client protocol is in use. Similarly, the interface that reads from the token list stream must be capable of reading entire transmissions. When this interface is called, it must lock the stream, excluding access by other processes until the entire transmission has been read. If the reading is aborted while the stream is locked, the stream enters an unsafe state. The only exit from this unsafe state is by means of receiving a mark. When the stream is unsafe, the only valid operation that can be performed upon it is "read and discard all tokens until a mark is encountered; read and discard that mark; declare the stream safe again".
Depending on the client protocol, the receipt of a mark might cause the reading side to read for further marks. NFILE implements the resynchronization of token list streams, and serves as a useful example: See the section "NFILE Control Connection Resynchronization", section 9.1. The Symbolics implementation provides the two mark-handling primitives in this way: 1. Send token (or list) preceded by a mark. When the stream is in the unsafe state (on the output side), this is the only permitted output operation (other than closing). 2. Read through to a mark and read the token (or list) following the mark. When the stream is in the unsafe state (on the input side), this is the only permitted input operation (other than closing). 11.3 Token List Data Stream The token list data stream is a facility to transmit stream data through a token list stream. The token list data stream imposes the following protocol on the data transmitted: - Data is sent in the format of loose data tokens, not contained in token lists. - The keyword token EOF indicates that the end of data has been reached. - Token lists can be transmitted through the token list data stream. - No loose tokens other than data tokens or the keyword token EOF can be sent. - Boundaries between data tokens are not signification. The data is considered to be a continuous stream, with the possible exception of marks. The token list data stream is most appropriate for sending file data. It is expected (but not required) that its typical mode of use is to send a large number of data tokens, with an occasional token list. The design intent was that token lists would be used by the application program to indicate exceptional situations. Data tokens, the keyword token EOF, and token lists are defined in
the token list stream documentation: See the section "Types of Tokens and Token Lists", section 11.2.1. The NFILE file protocol provides a good example of the use of token list data streams. NFILE sends file data through token list data streams; each NFILE data channel is a token list data stream. Errors such as disk errors during the reading of a file are conveyed as token lists through the token list data stream. 12. BYTE STREAM WITH MARK PURPOSE: Byte Stream with Mark is a simple layer of protocol that guarantees that an out-of-band signal can be transmitted in the case of program interruption. Byte Stream with Mark is designed to provide end-to-end stream consistency in the face of user program aborts. 12.1 Discussion of Byte Stream with Mark INTRODUCTION Byte Stream with Mark is a reliable, bidirectional byte stream with one out-of-band (but not out-of-sequence) signal called a "mark". The design of Byte Stream with Mark ensures that the mark is always recognizable on the receiving end. The Byte Stream with Mark is built on an underlying stream, which must support the transmission of 8-bit bytes. Byte Stream with Mark has been implemented to run on TCP and Chaos. Marks are implemented differently on the two protocols. Marks are used to resynchronize the stream when something has occurred to interrupt normal operations. For example, an application layer sending data over the Byte Stream with Mark can abort in the middle of sending that data. Recovery is handled by sending a mark. In the context of this document, "aborting" is defined as follows: Aborting the current execution of a program means to halt that execution and to abandon it, never to complete it. The data representing the state of the execution are irrevocably discarded. EXAMPLE OF USE Byte Stream with Mark is the layer of protocol underlying NFILE. NFILE uses the marks implemented in Byte Stream with Mark to resynchronize control connections or data channels whose synchronization has been lost. For a description of NFILE's use of marks to resynchronize streams: See the section "NFILE Resynchronization Procedure", section 9.
BYTE STREAM WITH MARK ON CHAOSNET A mark is recognized on Chaosnet by a packet bearing the opcode 201 (octal). There is no data in a mark packet, so the data portion of the packet is ignored. Byte Stream with Mark transmits all data in packets bearing opcode 200 (octal). If Byte Stream with Mark is implemented on another (non-Chaos) stream that supports opcode-bearing packets, the recommended implementation is the reservation of an opcode for the mark. BYTE STREAM WITH MARK ON TCP: RECORD MODE The purpose of Byte Stream with Mark is to guarantee that marks can always be unambiguously identified. Therefore, for TCP (and for any transport layer that does not implement packets natively) a simple record stream is imposed on the stream. The record boundaries serve only to distinguish where a mark can occur. A record consists of a two-byte byte count, most significant byte first, followed by that many bytes of data. A byte count of zero is recognized as a mark. Both the sending side and the receiving side must rigorously maintain the integrity of the record boundaries. A writer to the stream must never output a byte count without that number of data bytes following. Similarly, a reader of the stream, after reading a byte count, has effectively contracted to read that many bytes from the encapsulated stream, regardless of whether those bytes are requested by the application layer. MAINTAINING RECORD INTEGRITY This subsection deals with maintaining record integrity on non-Chaos networks. Since Chaos implements packets natively, no special care is required to maintain record integrity on the Chaos network. The design discussed here guarantees record integrity; the underlying stream must guarantee data integrity. The basic design of Byte Stream with Mark on TCP (and other transport layers that do not implement packets natively) is to preserve record integrity by putting clearly demarcated, byte-counted records in the natural records of the encapsulated stream. Therefore, when the outer stream requests a buffer's worth of file data from the encapsulated stream, it expects to receive a buffer containing one entire, ntegral, record of that stream, complete with byte count. Because of diverse network implementations on different operating systems, the software that implements the encapsulated stream might
not be able to provide integral record buffers to the Byte Stream with Mark implementation. For example, the writing stream could have written records that are much longer than available buffers on the receiving system. In this case, a request to read from the encapsulated stream returns some buffer or some amount of data representing less than an entire Byte Stream with Mark record. The input subroutine of the Byte Stream with Mark implementation must therefore return a region of this (smaller) buffer, representing less than the full Byte Stream with Mark record. Nevertheless, the Byte Stream with Mark must extract the count of the full Byte Stream with Mark record from the first such buffer of each Byte Stream with Mark record, and maintain and update this count as succeeding component buffers are read. In this case, if the program reading from the Byte Stream with Mark aborts while reading data, the implementation of Byte Stream with Mark must continue to read through the remaining buffers of the Byte Stream with Mark record that has been subdivided in this fashion. The user side program will have determined that an abort has occurred, and will request the Byte Stream with Mark to read up to and through the next mark. The Byte Stream with Mark will have processed a fractional record, and must discard the remaining buffers of the record now being read. 12.2 Byte Stream with Mark Abortable States Byte Stream with Mark is designed to provide end-to-end stream consistency in the face of user program aborts. This section describes user program aborts, and how Byte Stream with Mark handles them. In the context of this document, "aborting" is defined as follows: Aborting the current execution of a program means to halt that execution and to abandon it, never to complete it. The data representing the state of the execution are irrevocably discarded. USER PROGRAM ABORTS AND I/O STREAMS Aborting the execution of the code that manipulates I/O streams, in general, poses significant problems. Given that a stream is a static data object, and is intended to be used over and over again, aborting the execution of any routine manipulating a stream can leave it in an inconsistent, unusable state. Many operating systems solve this problem by manipulating a large subset of streams within the confines of the supervisor or executive program, which is not vulnerable to aborts, short of system or network failure. Nevertheless, the need still exists to implement streams outside of the boundaries of the supervisor. Furthermore,
the Symbolics computer environment has no supervisor or executive program, and is thus vulnerable to aborts everywhere. BYTE STREAM WITH MARK HANDLING OF USER PROGRAM ABORTS Byte Stream with Mark is designed to be nearly impervious to the aborting of programs using it. Its design is based on careful analysis of all possible states of the stream, and of the effect of aborts of the programs using the stream in each of these states. This section provides that analysis. A "transmission" is a collection of user data sent by the application level through the Byte Stream with Mark whose end is well-defined, once its start has been recognized. For instance, the token list stream, when using Byte Stream with Mark, sends token lists. When a TOP-LEVEL-LIST-BEGIN has been sent, the containing transmission is not considered complete until the corresponding TOP-LEVEL-LIST-END is read. See the section "Token List Transport Layer", section 11. The following cases are possible states of the stream when an abort occurs: 1. Abort occurs when the user program is not manipulating the stream. This case presents no problem. 2. Abort occurs after a transmission has been partially sent, at a packet or record boundary. This implies that the datum that would indicate the successful complete sending of that transmission has been not yet been sent. The Byte Stream with Mark state is consistent, but the application level state is not. The application level must determine that the execution of the code composing and sending its transmission was, in fact, aborted, and initiate resynchronization via marks. The receiving side must be careful not to act upon a transmission (that is, to perform any action or side effect) until the transmission has been successfully received in entirety. This protects the user program from the possibility that an abort can occur after a transmission has been partially sent.
3. Abort occurs during the sending or receiving of a record. This is the most vulnerable state of the mechanism. This case does not occur on packet-oriented media; it is subsumed by the next case. This case is handled by minimizing the extent of this window, and killing the connection when and if the situation is detected. Depending on the operating system involved, this window could be minimized by using interrupt-disabling mechanisms, auxiliary processes or tasks, or some other technique. For buffered streams, input and output waiting can be done in consistent states, thus minimizing the amount of time manipulating the actual encapsulated stream. For unbuffered streams, a lot of time can be spent in this window. It is expected that unbuffered streams will be exceedingly uncommon. Nevertheless, the implementation of Byte Stream with Mark must detect this case. 4. Abort occurs during the sending or receiving of fundamental units of the lowest-level underlying stream (packets, buffers, or bytes). This case is usually handled by inhibiting interrupts, or other forms of masking, in the code implementing the encapsulated stream, since no waiting is possible at unexpected times. 13. POSSIBLE FUTURE EXTENSIONS NFILE was designed to be extended as the needs of its clients grow, or as new clients with different needs appear. Currently it meets the needs of the Symbolics Genera 7.0 operating system, although its design is intentionally general. If users of other operating systems identify new features that would be useful, they could be added to NFILE. This section illustrates some areas areas where the design of NFILE intentionally accommodates extensions. - The NFILE protocol encodes commands and responses as text, rather than using prearranged numbers. This means that new commands and responses can be added without having to obtain a new number from a central registry. - The Token List Transport Layer provides a general substrate for the value-transmission portion of network protocols. In fact, it has been used at Symbolics for other protocols
besides NFILE. The Token List Transport Layer could conveniently be extended to support transmission of other types of values besides those it currently supports. - The character set to be used for file transfer could be made negotiable. - The command character set could be made negotiable. Currently there is no negotiation sequence, but one could be added. - Greater support for more complex file organizations could be added, such as record files, databases, and so on. This could be an extension to the direct access mode facility. - Currently, the LOGIN command allows the user side to inform the server which version of NFILE it is running. This feature is included in NFILE so that a server can continue to support older versions of the protocol even after new, extended versions have been implemented. However, the specification is currently somewhat vague as to how the server can make use of the version. - NFILE is not restricted to using TCP or Chaos as its underlying protocol. NFILE can be built on any byte stream protocol that supports reliable transmission of 8-bit bytes and multiple connections. In addition to the possible future extensions, we would like to mention a known limitation of NFILE. Currently NFILE requires multiple connections for a single session. That is, the control connection must be separate from the data connections. If NFILE is to be used over a telephone, this requirement poses an inconvenient restriction. It is possible to implement a multiplexing scheme as a level between NFILE and the communication medium.
APPENDIX A NORMAL TRANSLATION MODE NORMAL translation mode guarantees the following: - A file containing characters in the NFILE character set can be written to any NFILE server and read back intact (containing the same characters). - A file written by NFILE should not appear as "foreign" to a server operating system unless the file contains NFILE's extended characters. That is, a server file that uses only the subset of the NFILE character set limited to standard ASCII characters (the 95 printing characters, and the native representation of return, linefeed, page, backspace, rubout, and tab) can be read and written, with the result being the same data in NFILE characters as exists in server characters. In this section, all numbers designating values of character codes are to be interpreted in octal. The notation "x in c1..c2" means "for all character codes x such that c1 <= x <= c2." The NFILE character set is an extension of standard ASCII. The 95 ASCII printing characters have the same numerical codes in the NFILE character set. Five ASCII non-printing characters have counterparts in the NFILE character set, as shown in the following table. The NFILE character set includes a single Return character, rather than the carriage-return line-feed sequence typically used in ASCII. The NFILE character set does not include the ASCII control characters, other than the five shown in the following table, but does include some additional printing and formatting characters that have no counterparts in ASCII. NFILE Standard ASCII Rubout: 207 177 Backspace: 210 10 Tab: 211 11 Linefeed: 212 12 Page: 214 14 Note that the NFILE Return character is of code 215. This character includes "going to the next line". This is a notable difference from the convention used in PDP-10 ASCII in which lines are ended by a pair of characters, "carriage return" and "line feed".
NORMAL TRANSLATION TO UNIX SERVERS The translation given in this table is appropriate for use by UNIX servers, or other servers that use 8-bit bytes to store ASCII characters. Machines with 8-bit bytes usually place the extra NFILE characters in the top half of their character set. TABLE 1. TRANSLATIONS FROM NFILE CHARACTERS TO UNIX CHARACTERS NFILE character UNIX character x in 000..007 x x in 010..015 x + 200 x in 016..176 x 177 377 x in 200..207 x x in 210..211 x - 200 212 015 x in 213..214 x - 200 215 012 x in 216..376 x 377 177 TABLE 2. TRANSLATIONS FROM UNIX CHARACTERS TO NFILE CHARACTERS UNIX character NFILE character x in 000..007 x x in 010..011 x + 200 012 215 x in 013..014 x + 200 015 212 x in 016..176 x 177 377 x in 200..207 x x in 210..215 x - 200 x in 216..376 x 377 177 NORMAL TRANSLATION TO PDP-10 FAMILY SERVERS The translation given in this table is appropriate for use by PDP-10 family servers, or other servers that use 7-bit bytes to store ASCII characters. On the PDP-10 the sequence CRLF, 015 012, represents a new line.
The mechanism for this translation on machines with 7-bit bytes is to use the RUBOUT character (octal code 177) as an escape character. TABLE 3. TRANSLATIONS FROM NFILE TO PDP-10 CHARACTERS NFILE character PDP-10 character(s) x in 000..007 x x in 010..012 177 x 013 013 x in 014..015 177 x x in 016..176 x 177 177 177 x in 200..207 177 x - 200 x in 210..212 x - 200 213 177 013 214 014 215 015 012 x in 216..376 177 x - 200 377 no corresponding code These tables might seem confusing at first, but there are some general rules about it that should make it clearer. First, NFILE characters in the range 000..177 are generally represented as themselves, and x in 200..377 is generally represented as 177 followed by x - 200. That is, 177 is used to quote the second 200 NFILE characters. It was deemed that 177 is a more useful and common character than 377, so 177 177 means 177, and there is no way to describe 377 with PDP-10 ASCII characters. In the NFILE character set, the formatting control characters appear offset up by 200 with respect to standard ASCII. This explains why the preferred mode of expressing 210 (backspace) is 010, and 010 turns into 177 010. The same reasoning applies to 211 (Tab), 212 (Linefeed), 214 (Formfeed), and 215 (Return). More special care is needed for the Return character, which is the mapping of the system-dependent representation of "the start of a new line". The NFILE Return (215) is equivalent to 015 012 (CRLF) in some ASCII systems. In the NFILE character set there is no representation
TABLE 4. TRANSLATIONS FROM PDP-10 CHARACTERS TO NFILE CHARACTERS PDP-10 character NFILE character x in 000..007 x x in 010..012 x + 200 013 013 014 214 015 012 215 015 not-012 115 x in 016..176 x 177 x in 000..007 x + 200 177 x in 010..012 x 177 013 213 177 x in 014..015 x 177 x in 016..176 x + 200 177 177 177 of a carriage that doesn't go to a new line, so if there is one in a server file, it must be translated to something else. When converting ASCII characters to NFILE characters, an 015 followed by an 012 therefore turns into a 215. A stray CR is arbitrarily translated into a single M (115).
APPENDIX B RAW TRANSLATION MODE RAW mode means no translation should be performed. In RAW mode the server operating system should treat the file as a character file and use the same data formatting that would be appropriate for a character file, but transfer the actual binary values of the character codes.
APPENDIX C SUPER-IMAGE TRANSLATION MODE SUPER-IMAGE mode is intended for use by PDP-10 family machines only. It is included largely as an illustration of a system-dependent extension. A server machine that has 8-bit bytes should treat SUPER-IMAGE mode the same as NORMAL mode. In this section, all numbers designating values of character codes are to be interpreted in octal. The notation "x in c1..c2" means "for all character codes x such that c1 <= x <= c2." SUPER-IMAGE mode suppresses the use of the 177 character as an escape character. Character translation should be done as in NORMAL mode, with one exception. When a two-character sequence beginning with 177 is detected, the 177 should not be output at all. In this section, all numbers designating values of character codes are to be interpreted in octal. SUPER-IMAGE mode is intended for use by PDP-10 machines only. SUPER-IMAGE suppresses the use of Rubout for quoting. That is, for each entry beginning with a 177 in the PDP-10 character column in the NORMAL translation table, the NFILE character has the 177 removed. TABLE 5. SUPER-IMAGE TRANSLATION FROM NFILE TO ASCII NFILE character PDP-10 character(s) x in 000..177 x x in 200..214 <x - 200> 215 015 012 x in 216..376 <x - 200> 377 no corresponding code
TABLE 6. SUPER-IMAGE TRANSLATION FROM ASCII TO NFILE PDP-10 character NFILE character x in 000..007 x x in 010..012 x + 200 013 013 014 214 015 012 215 015 not-012 115 x in <016..176> x 177 177
NOTES 1. NFILE's requirement for using the NFILE character set is recognized as a drawback for non-Symbolics machines. A useful extension to NFILE would be a provision to make the character set negotiable. 2. Implementation note: Care must be taken that the freeing is done before the control connection is allowed to process another command, or else the control connection may find the data channel to be falsely indicated as being in use. 3. The Symbolics operating system has the policy that whenever the user side is waiting for the server side, a user abort can occur. This user side waiting can occur in any context, such awaiting a response, waiting in the middle of reading network input, or waiting in the middle of transmitting network output. Thus there are no "hung" states. 4. Note that the Token List Transport Layer supplies a special token to indicate Boolean truth, but no corresponding token to indicate Boolean falsity. NFILE uses an empty token list to indicate Boolean falsity. The historical reason for this asymmetry is the inability of the Lisp language to differentiate between the empty list and NIL, which is traditionally used to mean Boolean falsity. If the flexibility of both a Boolean falsity and an empty token list were allowed, it would create problems for an operating system that cannot distinguish between the two. This aspect of the protocol is recognized as a concession to the Lisp language. The unfortunate effect is to disallow operating systems to distinguish between Boolean falsity and an empty list. 5. No so-called "fat strings" can be sent.