The syntax of the above argument fields (using BNF notation where applicable ) is: <username> ::= <string> <password> ::= <string> <account information> ::= <string> <string> ::= <char> | <char><string> <char> ::= any of the 128 ASCII characters except <CR> and <LF> <marker> ::= <pr string> <pr string> ::= <pr char> | <pr char><pr string> <pr char> ::= printable characters, any ASCII code 33 through 126 <byte size> ::= any decimal integer 1 through 255 <Host-port> ::= <Host-number>,<Port-number> <Host-number> ::= <number>,<number>,<number>,<number> <Port-number> ::= <number>,<number> <number> ::= any decimal integer 0 through 255 <ident> ::= <string> <scheme> ::= R | T | ? <form code> ::= N | T | C <type code> ::= A [<SP> <form code>] | E [<SP> <form code>] | I | L <SP> <byte size> <structure code> ::= F | R | P <mode code> ::= S | B | C <pathname> ::= <string>
SEQUENCING OF COMMANDS AND REPLIES The communication between the user and server is intended to be an alternating dialogue. As such, the user issues an FTP command and the server responds with a prompt primary reply. The user should wait for this initial primary success or failure response before sending further commands. Certain commands require a second reply for which the user should also wait. These replies may, for example, report on the progress or completion of file transfer or the closing of the data connection. They are secondary replies to file transfer commands. One important group of informational replies is the connection greetings. Under normal circumstances, a server will send a 220 reply, "awaiting input", when the connection is completed. The user should wait for this greeting message before sending any commands. If the server is unable to accept input right away, he should send a 120 "expected delay" reply immediately and a 220 reply when ready. The user will then know not to hang up if there is a delay. The table below lists alternative success and failure replies for each command. These must be strictly adhered to; a server may substitute text in the replies, but the meaning and action implied by the code numbers and by the specific command reply sequence cannot be altered. Command-Reply Sequences In this section, the command-reply sequence is presented. Each command is listed with its possible replies; command groups are listed together. Preliminary replies are listed first (with their succeeding replies indented and under them), then positive and negative completion, and finally intermediary replies with the remaining commands from the sequence following. This listing forms the basis for the state diagrams, which will be presented separately. Connection Establishment 120 220 220 421
Login USER 230 530 500, 501, 421 331, 332 PASS 230 202 530 500, 501, 503, 421 332 ACCT 230 202 530 500, 501, 503, 421 Logout QUIT 221 500 REIN 120 220 220 421 500, 502 Transfer parameters PORT 200 500, 501, 421, 530 PASV 227 500, 501, 502, 421, 530 MODE, TYPE, STRU 200 500, 501, 504, 421, 530 File action commands ALLO 200 202 500, 501, 504, 421, 530 REST 500, 501, 502, 421, 530 350
STOR 125, 150 (110) 226, 250 425, 426, 451, 551, 552 532, 450, 452, 553 500, 501, 421, 530 RETR 125, 150 (110) 226, 250 425, 426, 451 450, 550 500, 501, 421, 530 LIST, NLST 125, 150 226, 250 425, 426, 451 450 500, 501, 502, 421, 530 APPE 125, 150 (110) 226, 250 425, 426, 451, 551, 552 532, 450, 550, 452, 553 500, 501, 502, 421, 530 MLFL 125, 150, 151, 152 226, 250 425, 426, 451, 552 532, 450, 550, 452, 553 500, 501, 502, 421, 530 RNFR 450, 550 500, 501, 502, 421, 530 350 RNTO 250 532, 553 500, 501, 502, 503, 421, 530 DELE, CWD 250 450, 550 500, 501, 502, 421, 530
ABOR 225, 226 500, 501, 502, 421 MAIL, MSND 151, 152 354 250 451, 552 354 250 451, 552 450, 550, 452, 553 500, 501, 502, 421, 530 MSOM, MSAM 119, 151, 152 354 250 451, 552 354 250 451, 552 450, 550, 452, 553 500, 501, 502, 421, 530 MRSQ 200, 215 500, 501, 502, 421, 530 MRCP 151, 152 200 200 450, 550, 452, 553 500, 501, 502, 503, 421 Informational commands STAT 211, 212, 213 450 500, 501, 502, 421, 530 HELP 211, 214 500, 501, 502, 421 Miscellaneous commands SITE 200 202 500, 501, 530
NOOP 200 500 421
STATE DIAGRAMS Here we present state diagrams for a very simple minded FTP implementation. Only the first digit of the reply codes is used. There is one state diagram for each group of FTP commands or command sequences. The command groupings were determined by constructing a model for each command then collecting together the commands with structurally identical models. For each command or command sequence there are three possible outcomes: success (S), failure (F), and error (E). In the state diagrams below we use the symbol B for "begin", and the symbol W for "wait for reply". We first present the diagram that represents the largest group of FTP commands: 1,3 +---+ ----------->| E | | +---+ | +---+ cmd +---+ 2 +---+ | B |---------->| W |---------->| S | +---+ +---+ +---+ | | 4,5 +---+ ----------->| F | +---+ This diagram models the commands: ABOR, ALLO, DELE, CWD, HELP, MODE, MRCP, MRSQ, NOOP, PASV, QUIT, SITE, PORT, STAT, STRU, TYPE.
The other large group of commands is represented by a very similar diagram: 3 +---+ ----------->| E | | +---+ | +---+ cmd +---+ 2 +---+ | B |---------->| W |---------->| S | +---+ --->+---+ +---+ | | | | | | 4,5 +---+ | 1 | ----------->| F | ----- +---+ This diagram models the commands: APPE, LIST, MLFL, NLST, REIN, RETR, STOR. Note that this second model could also be used to represent the first group of commands, the only difference being that in the first group the 100 series replies are unexpected and therefore treated as error, while the second group expects (some may require) 100 series replies. The remaining diagrams model command sequences, perhaps the simplest of these is the rename sequence: +---+ RNFR +---+ 1,2 +---+ | B |---------->| W |---------->| E | +---+ +---+ -->+---+ | | | 3 | | 4,5 | -------------- ------ | | | | +---+ | ------------->| S | | | 1,3 | | +---+ | 2| -------- | | | | V | | | +---+ RNTO +---+ 4,5 ----->+---+ | |---------->| W |---------->| F | +---+ +---+ +---+
A very similar diagram models the Mail and Send commands: ---- 1 | | +---+ cmd -->+---+ 2 +---+ | B |---------->| W |---------->| E | +---+ +---+ -->+---+ | | | 3 | | 4,5 | -------------- ------ | | | | +---+ | ------------->| S | | | 1,3 | | +---+ | 2| -------- | | | | V | | | +---+ text +---+ 4,5 ----->+---+ | |---------->| W |---------->| F | +---+ +---+ +---+ This diagram models the commands: MAIL, MSND, MSOM, MSAM. Note that the "text" here is a series of lines sent from the user to the server with no response expected until the last line is sent, recall that the last line must consist only of a single period.
The next diagram is a simple model of the Restart command: +---+ REST +---+ 1,2 +---+ | B |---------->| W |---------->| E | +---+ +---+ -->+---+ | | | 3 | | 4,5 | -------------- ------ | | | | +---+ | ------------->| S | | | 3 | | +---+ | 2| -------- | | | | V | | | +---+ cmd +---+ 4,5 ----->+---+ | |---------->| W |---------->| F | +---+ -->+---+ +---+ | | | 1 | ------ Where "cmd" is APPE, STOR, RETR, or MLFL. We note that the above three models are similar, in fact the Mail diagram and the Rename diagram are structurally identical. The Restart differs from the other two only in the treatment of 100 series replies at the second stage.
The most complicated diagram is for the Login sequence: 1 +---+ USER +---+------------->+---+ | B |---------->| W | 2 ---->| E | +---+ +---+------ | -->+---+ | | | | | 3 | | 4,5 | | | -------------- ----- | | | | | | | | | | | | | | --------- | | 1| | | | V | | | | +---+ PASS +---+ 2 | ------>+---+ | |---------->| W |------------->| S | +---+ +---+ ---------->+---+ | | | | | 3 | |4,5| | | -------------- -------- | | | | | | | | | | | | ----------- | 1,3| | | | V | 2| | | +---+ ACCT +---+-- | ----->+---+ | |---------->| W | 4,5 -------->| F | +---+ +---+------------->+---+
Finally we present a generalized diagram that could be used to model the command and reply interchange: ------------------------------------ | | Begin | | | V | | +---+ cmd +---+ 2 +---+ | -->| |------->| |---------->| | | | | | W | | S |-----| -->| | -->| |----- | | | | +---+ | +---+ 4,5 | +---+ | | | | | | | | | | | 1| |3 | +---+ | | | | | | | | | | | | ---- | ---->| F |----- | | | | | | | | +---+ ------------------- | | V End
TYPICAL FTP SCENARIO User at Host U wanting to transfer files to/from Host S: In general the user will communicate to the server via a mediating user-FTP process. The following may be a typical scenario. The user-FTP prompts are shown in parentheses, '---->' represents commands from Host U to Host S, and '<----' represents replies from Host S to Host U. LOCAL COMMANDS BY USER ACTION INVOLVED ftp (host) multics<CR> Connect to Host S, port L, establishing TELNET connections <---- 220 Service ready <CRLF> username Doe <CR> USER Doe<CRLF>----> <---- 331 User name ok, need password<CRLF> password mumble <CR> PASS mumble<CRLF>----> <---- 230 User logged in.<CRLF> retrieve (local type) ASCII<CR> (local pathname) test 1 <CR> User-FTP opens local file in ASCII. (for.pathname) test.pl1<CR> RETR test.pl1<CRLF> ----> <---- 150 File status okay; about to open data connection Server makes data connection to port U <CRLF> <---- 226 Closing data connection, file transfer successful<CRLF> type Image<CR> TYPE I<CRLF> ----> <---- 200 Command OK<CRLF> store (local type) image<CR> (local pathname) file dump<CR> User-FTP opens local file in Image. (for.pathname) >udd>cn>fd<CR> STOR >udd>cn>fd<CRLF> ----> <---- 450 Access denied<CRLF> terminate QUIT <CRLF> ----> Server closes all connections.
CONNECTION ESTABLISHMENT The FTP control connection is established via TCP between the user process port U and the server process port L. This protocol is assigned the service port 21 (25 octal), that is L=21.
APPENDIX ON MAIL The basic commands transmitting mail are the MAIL and the MLFL commands. These commands cause the transmitted data to be entered into the recipients mailbox. MAIL <SP> <recipient name> <CRLF> If accepted, returns 354 reply and considers all succeeding lines to be the message text, terminated by a line containing only a period, upon which a 250 completion reply is returned. Various errors are possible. MLFL <SP> <recipient name> <CRLF> If accepted, acts like a STOR command, except that the data is considered to be the message text. Various errors are possible. There are two possible preliminary replies that a server may use to indicate that it is accepting mail for a user whose mailbox is not at that server. 151 User not local; Will forward to <user>@<host>. This reply indicates that the server knows the user's mailbox is on another host and will take responsibility for forwarding the mail to that host. For example, at BBN (or ISI) there are several host which each have a list of many of the users on several of the host. These hosts then can accept mail for any user on their list and forward it to the correct host. 152 User Unknown; Mail will be forwarded by the operator. This reply indicates that the host does not recognize the user name, but that it will accept the mail and have the operator attempt to deliver it. This is useful if the user name is misspelled, but may be a disservice if the mail is really undeliverable. Three FTP commands provide for "sending" a message to a logged-in user's terminal, as well as variants for mailing it normally whether the user is logged in or not.
MSND -- SeND to terminal. Returns 450 failure reply if the addressee is refusing or not logged in. MSOM -- Send to terminal Or Mailbox. Returns 119 notification reply if terminal is not accessible. MSAM -- Send to terminal And Mailbox. Returns 119 notification reply if terminal is not accessible. Note that for MSOM and MSAM, it is the mailing which determines success, not the sending, although MSOM as implemented uses a 119 reply (in addition to the normal success/failure code) to indicate that because the SEND failed, an attempt is being made to mail the message instead. There are no corresponding variants for MLFL, since messages transmitted in this way are generally short. There are two FTP commands which allow one to mail the text of a message to several recipients simultaneously; such message transmission is far more efficient than the practice of sending the text again and again for each additional recipient at a site. There are two basic ways of sending a single text to several recipients. In one, all recipients are specified first, and then the text is sent; in the other, the order is reversed and the text is sent first, followed by the recipients. Both schemes are necessary because neither by itself is optimal for all systems, as will be explained later. To select a particular scheme, the MRSQ command is used; to specify recipients after a scheme is chosen, MRCP commands are given; and to furnish text, the MAIL or MLFL commands are used. Scheme Selection: MRSQ MRSQ is the means by which a user program can test for implementation of MRSQ/MRCP, select a particular scheme, reset its state thereof, and even do some rudimentary negotiation. Its format is like that of the TYPE command, as follows:
MRSQ [<SP> <scheme>] <CRLF> <scheme> = a single character. The following are defined: R Recipients first. If not implemented, T must be. T Text first. If this is not implemented, R must be. ? Request for preference. Must always be implemented. No argument means a "selection" of none of the schemes (the default). Replies: 200 OK, we'll use specified scheme. 215 <scheme> This is the scheme I prefer. 501 I understand MRSQ but can't use that scheme. 5xx Command unrecognized or unimplemented. Three aspects of MRSQ need to be pointed out here. The first is that an MRSQ with no argument must always return a 200 reply and restore the default state of having no scheme selected. Any other reply implies that MRSQ and hence MRCP are not understood or cannot be performed correctly. The second is that the use of "?" as a <scheme> asks the FTP server to return a 215 reply in which the server specifies a "preferred" scheme. The format of this reply is simple: 215 <SP> <scheme> [<SP> <arbitrary text>] <CRLF> Any other reply (e.g. 4xx or 5xx) implies that MRSQ and MRCP are not implemented, because "?" must always be implemented if MRSQ is. The third important thing about MRSQ is that it always has the side effect of resetting all schemes to their initial state. This reset must be done no matter what the reply will be - 200, 215, or 501. The actions necessary for a reset will be explained when discussing how each scheme actually works. Message Text Specification: MAIL/MLFL Regardless of which scheme (if any) has been selected, a MAIL or MLFL with a non-null argument will behave exactly as before; the MRSQ/MRCP commands have no effect on them. However, such normal MAIL/MLFL commands do have the same side effect as MRSQ; they "reset" the current scheme to its initial state.
It is only when the argument is null (e.g. MAIL<CRLF> or MLFL<CRLF>) that the particular scheme being used is important, because rather than producing an error (as most servers currently do), the server will accept message text for this "null" specification; what it does with it depends on which scheme is in effect, and will be described in "Scheme Mechanics". Recipient specification: MRCP In order to specify recipient names (i.e., idents) and receive some acknowledgment (or refusal) for each name, the following command is used: MRCP <SP> <ident> <CRLF> Reply for no scheme: 503 No scheme specified yet; use MRSQ. Replies for scheme T are identical to those for MAIL/MLFL. Replies for scheme R (recipients first): 200 OK, name stored. 452 Recipient table full, this name not stored. 553 Recipient name rejected. 4xx Temporary error, try this name again later. 5xx Permanent error, report to sender. Note that use of this command is an error if no scheme has been selected yet; an MRSQ <scheme> must have been given if MRCP is to be used. Scheme mechanics: MRSQ R (Recipients first) In the recipients-first scheme, MRCP is used to specify names which the FTP server stores in a list or table. Normally the reply for each MRCP will be either a 200 for acceptance, or a 4xx/5xx code for rejection; all 5xx codes are permanent rejections (e.g. user not known) which should be reported to the human sender, whereas 4xx codes in general connote some temporary error that may be rectified later. None of the 4xx/5xx replies impinge on previous or succeeding MRCP commands, except for 452 which indicates that no further MRCP's will succeed unless a message is sent to the already stored recipients or a reset is done. Sending message text to stored recipients is done by giving a MAIL or MLFL command with no argument; that is, just MAIL<CRLF> or MLFL<CRLF>. Transmission of the message text is exactly the same as for normal MAIL/MLFL; however, a positive acknowledgment at the
end of transmission means that the message has been sent to ALL recipients that were remembered with MRCP, and a failure code means that it should be considered to have failed for ALL of these specified recipients. This applies regardless of the actual error code; and whether the reply signifies success or failure, all stored recipient names are flushed and forgotten - in other words, things are reset to their initial state. This purging of the recipient name list must also be done as the "reset" side effect of any use of MRSQ. A 452 reply to an MRCP can thus be handled by using a MAIL/MLFL to specify the message for currently stored recipients, and then sending more MRCP's and another MAIL/MLFL, as many times as necessary; for example, if a server only had room for 10 names this would result in a 50-recipient message being sent 5 times, to 10 different recipients each time. If a user attempts to specify message text (MAIL/MLFL with no argument) before any successful MRCP's have been given, this should be treated exactly as a "normal" MAIL/MLFL with a null recipient would be; some servers will return an error of some type, such as "550 Null recipient". See Example 1 for an example using MRSQ R. Scheme mechanics: MRSQ T (Text first) In the text-first scheme, MAIL/MLFL with no argument is used to specify message text, which the server stores away. Succeeding MRCP's are then treated as if they were MAIL/MLFL commands, except that none of the text transfer manipulations are done; the stored message text is sent to the specified recipient, and a reply code is returned identical to that which an actual MAIL/MLFL would invoke. (Note ANY 2xx code indicates success.) The stored message text is not forgotten until the next MAIL/MLFL or MRSQ, which will either replace it with new text or flush it entirely. Any use of MRSQ will reset this scheme by flushing stored text, as will any use of MAIL/MLFL with a non-null argument. If an MRCP is seen before any message text has been stored, the user in effect is trying to send a null message; some servers might allow this, others would return an error code. See Example 2 for an example using MRSQ T.
Why two schemes anyway? Because neither by itself is optimal for all systems. MRSQ R allows more of a "bulk" mailing, because everything is saved up and then mailed simultaneously; this is very useful for systems such as ITS where the FTP server does not itself write mail directly, but hands it on to a central mailer demon of great power; the more information (e.g. recipients) associated with a single "hand-off", the more efficiently mail can be delivered. By contrast, MRSQ T is geared to FTP servers which want to deliver mail directly, in one-by-one incremental fashion. This way they can return an individual success/failure reply code for each recipient given which may depend on variable file system factors such as exceeding disk allocation, mailbox access conflicts, and so forth; if they tried to emulate MRSQ R's bulk mailing, they would have to ensure that a success reply to the MAIL/MLFL indeed meant that it had been delivered to ALL recipients specified - not just some. Notes: * Because these commands are not required in the minimum implementation of FTP, one must be prepared to deal with sites which don't recognize either MRSQ or MRCP. "MRSQ" and "MRSQ ?" are explicitly designed as tests to see whether either scheme is implemented; MRCP is not, and a failure return of the "unimplemented" variety could be confused with "No scheme selected yet", or even with "Recipient unknown". Be safe, be sure, use MRSQ! * There is no way to indicate in a positive response to "MRSQ ?" that the preferred "scheme" for a server is that of the default state; i.e. none of the multi-recipient schemes. The rationale is that in this case, it would be pointless to implement MRSQ/MRCP at all, and the response would therefore be negative. * One reason that the use of MAIL/MLFL is restricted to null arguments with this multi-recipient extension is the ambiguity that would result if a non-null argument were allowed; for example, if MRSQ R was in effect and some MRCP's had been given, and a MAIL FOO<CRLF> was done, there would be no way to distinguish a failure reply for mailbox "FOO" from a global failure for all recipients specified. A similar situation exists for MRSQ T; it would not be clear whether the text was stored and the mailbox failed, or vice versa, or both.
* "Resets" are done by all MRSQ's and "normal" MAIL/MLFL's to avoid confusion and overly complicated implementation. The MRSQ command implies a change or uncertainty of status, and the latter commands would otherwise have to use some independent mechanisms to avoid clobbering the data bases (e.g., message text storage area) used by the T/R schemes. However, once a scheme is selected, it remains "in effect" just as a "TYPE A" remains selected. The recommended way for doing a reset, without changing the current selection, is with "MRSQ ?". Remember that "MRSQ" alone reverts to the no-scheme state. * It is permissible to intersperse other FTP commands among the MRSQ/MRCP/MAIL sequences.
Example 1 Example of MRSQ R (Recipients first) This is an example of how MRSQ R is used; first the user must establish that the server in fact implements MRSQ: U: MRSQ S: 200 OK, no scheme selected. An MRSQ with a null argument always returns a 200 if implemented, selecting the "scheme" of null, i.e. none of them. If MRSQ were not implemented, a code of 4xx or 5xx would be returned. U: MRSQ R S: 200 OK, using that scheme All's well; now the recipients can be specified. U: MRCP Foo S: 200 OK U: MRCP Raboof S: 553 Who's that? No such user here. U: MRCP bar S: 200 OK Well, two out of three ain't bad. Note that the demise of "Raboof" has no effect on the storage of "Foo" or "bar". Now to furnish the message text, by giving a MAIL or MLFL with no argument: U: MAIL S: 354 Type mail, ended by <CRLF>.<CRLF> U: Blah blah blah blah....etc etc etc U: . S: 250 Mail sent. The text has now been sent to both "Foo" and "bar".
Example 2 Example of MRSQ T (Text first) Using the same message as the previous example: U: MRSQ ? S: 215 T Text first, please. MRSQ is indeed implemented, and the server says that it prefers "T", but that needn't stop the user from trying something else: U: MRSQ R S: 501 Sorry, I really can't do that. It's possible that it could have understood "R" also, but in general it's best to use the "preferred" scheme, since the server knows which is most efficient for its particular site. Anyway: U: MRSQ T S: 200 OK, using that scheme. Scheme "T" is now selected, and the text must be sent: U: MAIL S: 354 Type mail, ended by <CRLF>.<CRLF> U: Blah blah blah blah....etc etc etc U: . S: 250 Mail stored. Now recipients can be specified: U: MRCP Foo S: 250 Stored mail sent. U: MRCP Raboof S: 553 Who's that? No such user here. U: MRCP bar S: 250 Stored mail sent.
Again, the text has now been sent to both "Foo" and "bar", and still remains stored. A new message can be sent with another MAIL/MRCP... sequence, but the fastidious or paranoid could chose to do: U: MRSQ ? S: 215 T Text first, please. Which resets things without altering the scheme in effect.
APPENDIX ON PAGE STRUCTURE The need for FTP to support page structure derives principally from the need to support efficient transmission of files between TOPS20 systems, particularly the files used by NLS. The file system of TOPS20 is based on the concept of pages. The system level is most efficient at manipulating files as pages. System level programs provide an interface to the file system so that many applications view files as sequential streams of characters. However, a few applications use the underlying page structures directly, and some of these create holey files. A TOPS20 file is just a bunch of words pointed to by a page table. If those words contain CRLF's, fine -- but that doesn't mean "record" to TOPS20. A TOPS20 disk file consists of four things: a pathname, a page table, a (possibly empty) set of pages, and a set of attributes. The pathname is specified in the RETR or STOR command. It includes the directory name, file name, file name extension, and version number. The page table contains up to 2**18 entries. Each entry may be EMPTY, or may point to a page. If it is not empty, there are also some page-specific access bits; not all pages of a file need have the same access protection. A page is a contiguous set of 512 words of 36 bits each. The attributes of the file, in the File Descriptor Block (FDB), contain such things as creation time, write time, read time, writer's byte-size, end of file pointer, count of reads and writes, backup system tape numbers, etc. Note that there is NO requirement that pages in the page table be contiguous. There may be empty page table slots between occupied ones. Also, the end of file pointer is simply a number. There is no requirement that it in fact point at the "last" datum in the file. Ordinary sequential I/O calls in TOPS20 will cause the end of file pointer to be left after the last datum written, but other operations may cause it not to be so, if a particular programming system so requires.
In fact both of these special cases, "holey" files and end-of-file pointers not at the end of the file, occur with NLS data files. The TOPS20 paged files can be sent with the FTP transfer parameters: TYPE L 36, STRU P, and MODE S (in fact any mode could be used). Each page of information has a header. Each header field, which is a logical byte, is a TOPS20 word, since the TYPE is L 36. The header fields are: Word 0: Header Length. The header length is 5. Word 1: Page Index. If the data is a disk file page, this is the number of that page in the file's page map. Empty pages (holes) in the file are simply not sent. Note that a hole is NOT the same as a page of zeros. Word 2: Data Length. The number of data words in this page, following the header. Thus the total length of the transmission unit is the Header Length plus the Data Length. Word 3: Page Type. A code for what type of chunk this is. A data page is type 3, the FDB page is type 2. Word 4: Page Access Control. The access bits associated with the page in the file's page map. (This full word quantity is put into AC2 of an SPACS by the program reading from net to disk.) After the header are Data Length data words. Data Length is currently either 512 for a data page or 21 for an FDB. Trailing zeros in a disk file page may be discarded, making Data Length less than 512 in that case.
Data transfers are implemented like the layers of an onion: some characters are packaged into a line. Some lines are packaged into a file. The file is broken into other manageable units for transmission. Those units have compression applied to them. The units may be flagged by restart markers. On the other end, the process is reversed.