RFC 1945

Hypertext Transfer Protocol -- HTTP/1.0

Pages: 60
Informational
→ Errata

Part 1 of 3 – Pages 1 to 21

RFC1945 - Page 1

Network Working Group                                     T. Berners-Lee
Request for Comments: 1945                                       MIT/LCS
Category: Informational                                      R. Fielding
                                                               UC Irvine
                                                              H. Frystyk
                                                                 MIT/LCS
                                                                May 1996


                Hypertext Transfer Protocol -- HTTP/1.0

Status of This Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   The IESG has concerns about this protocol, and expects this document
   to be replaced relatively soon by a standards track document.

Abstract

   The Hypertext Transfer Protocol (HTTP) is an application-level
   protocol with the lightness and speed necessary for distributed,
   collaborative, hypermedia information systems. It is a generic,
   stateless, object-oriented protocol which can be used for many tasks,
   such as name servers and distributed object management systems,
   through extension of its request methods (commands). A feature of
   HTTP is the typing of data representation, allowing systems to be
   built independently of the data being transferred.

   HTTP has been in use by the World-Wide Web global information
   initiative since 1990. This specification reflects common usage of
   the protocol referred to as "HTTP/1.0".

Table of Contents

   1.  Introduction ..............................................  4
       1.1  Purpose ..............................................  4
       1.2  Terminology ..........................................  4
       1.3  Overall Operation ....................................  6
       1.4  HTTP and MIME ........................................  8
   2.  Notational Conventions and Generic Grammar ................  8
       2.1  Augmented BNF ........................................  8
       2.2  Basic Rules .......................................... 10
   3.  Protocol Parameters ....................................... 12

RFC1945 - Page 2

       3.1  HTTP Version ......................................... 12
       3.2  Uniform Resource Identifiers ......................... 14
            3.2.1  General Syntax ................................ 14
            3.2.2  http URL ...................................... 15
       3.3  Date/Time Formats .................................... 15
       3.4  Character Sets ....................................... 17
       3.5  Content Codings ...................................... 18
       3.6  Media Types .......................................... 19
            3.6.1  Canonicalization and Text Defaults ............ 19
            3.6.2  Multipart Types ............................... 20
       3.7  Product Tokens ....................................... 20
   4.  HTTP Message .............................................. 21
       4.1  Message Types ........................................ 21
       4.2  Message Headers ...................................... 22
       4.3  General Header Fields ................................ 23
   5.  Request ................................................... 23
       5.1  Request-Line ......................................... 23
            5.1.1  Method ........................................ 24
            5.1.2  Request-URI ................................... 24
       5.2  Request Header Fields ................................ 25
   6.  Response .................................................. 25
       6.1  Status-Line .......................................... 26
            6.1.1  Status Code and Reason Phrase ................. 26
       6.2  Response Header Fields ............................... 28
   7.  Entity .................................................... 28
       7.1  Entity Header Fields ................................. 29
       7.2  Entity Body .......................................... 29
            7.2.1  Type .......................................... 29
            7.2.2  Length ........................................ 30
   8.  Method Definitions ........................................ 30
       8.1  GET .................................................. 31
       8.2  HEAD ................................................. 31
       8.3  POST ................................................. 31
   9.  Status Code Definitions ................................... 32
       9.1  Informational 1xx .................................... 32
       9.2  Successful 2xx ....................................... 32
       9.3  Redirection 3xx ...................................... 34
       9.4  Client Error 4xx ..................................... 35
       9.5  Server Error 5xx ..................................... 37
   10. Header Field Definitions .................................. 37
       10.1  Allow ............................................... 38
       10.2  Authorization ....................................... 38
       10.3  Content-Encoding .................................... 39
       10.4  Content-Length ...................................... 39
       10.5  Content-Type ........................................ 40
       10.6  Date ................................................ 40
       10.7  Expires ............................................. 41
       10.8  From ................................................ 42

RFC1945 - Page 3

       10.9  If-Modified-Since ................................... 42
       10.10 Last-Modified ....................................... 43
       10.11 Location ............................................ 44
       10.12 Pragma .............................................. 44
       10.13 Referer ............................................. 44
       10.14 Server .............................................. 45
       10.15 User-Agent .......................................... 46
       10.16 WWW-Authenticate .................................... 46
   11. Access Authentication ..................................... 47
       11.1  Basic Authentication Scheme ......................... 48
   12. Security Considerations ................................... 49
       12.1  Authentication of Clients ........................... 49
       12.2  Safe Methods ........................................ 49
       12.3  Abuse of Server Log Information ..................... 50
       12.4  Transfer of Sensitive Information ................... 50
       12.5  Attacks Based On File and Path Names ................ 51
   13. Acknowledgments ........................................... 51
   14. References ................................................ 52
   15. Authors' Addresses ........................................ 54
   Appendix A.   Internet Media Type message/http ................ 55
   Appendix B.   Tolerant Applications ........................... 55
   Appendix C.   Relationship to MIME ............................ 56
       C.1  Conversion to Canonical Form ......................... 56
       C.2  Conversion of Date Formats ........................... 57
       C.3  Introduction of Content-Encoding ..................... 57
       C.4  No Content-Transfer-Encoding ......................... 57
       C.5  HTTP Header Fields in Multipart Body-Parts ........... 57
   Appendix D.   Additional Features ............................. 57
       D.1  Additional Request Methods ........................... 58
            D.1.1  PUT ........................................... 58
            D.1.2  DELETE ........................................ 58
            D.1.3  LINK .......................................... 58
            D.1.4  UNLINK ........................................ 58
       D.2  Additional Header Field Definitions .................. 58
            D.2.1  Accept ........................................ 58
            D.2.2  Accept-Charset ................................ 59
            D.2.3  Accept-Encoding ............................... 59
            D.2.4  Accept-Language ............................... 59
            D.2.5  Content-Language .............................. 59
            D.2.6  Link .......................................... 59
            D.2.7  MIME-Version .................................. 59
            D.2.8  Retry-After ................................... 60
            D.2.9  Title ......................................... 60
            D.2.10 URI ........................................... 60

RFC1945 - Page 4

1.  Introduction

1.1  Purpose

   The Hypertext Transfer Protocol (HTTP) is an application-level
   protocol with the lightness and speed necessary for distributed,
   collaborative, hypermedia information systems. HTTP has been in use
   by the World-Wide Web global information initiative since 1990. This
   specification reflects common usage of the protocol referred too as
   "HTTP/1.0". This specification describes the features that seem to be
   consistently implemented in most HTTP/1.0 clients and servers. The
   specification is split into two sections. Those features of HTTP for
   which implementations are usually consistent are described in the
   main body of this document. Those features which have few or
   inconsistent implementations are listed in Appendix D.

   Practical information systems require more functionality than simple
   retrieval, including search, front-end update, and annotation. HTTP
   allows an open-ended set of methods to be used to indicate the
   purpose of a request. It builds on the discipline of reference
   provided by the Uniform Resource Identifier (URI) [2], as a location
   (URL) [4] or name (URN) [16], for indicating the resource on which a
   method is to be applied. Messages are passed in a format similar to
   that used by Internet Mail [7] and the Multipurpose Internet Mail
   Extensions (MIME) [5].

   HTTP is also used as a generic protocol for communication between
   user agents and proxies/gateways to other Internet protocols, such as
   SMTP [12], NNTP [11], FTP [14], Gopher [1], and WAIS [8], allowing
   basic hypermedia access to resources available from diverse
   applications and simplifying the implementation of user agents.

1.2  Terminology

   This specification uses a number of terms to refer to the roles
   played by participants in, and objects of, the HTTP communication.

   connection

       A transport layer virtual circuit established between two
       application programs for the purpose of communication.

   message

       The basic unit of HTTP communication, consisting of a structured
       sequence of octets matching the syntax defined in Section 4 and
       transmitted via the connection.

RFC1945 - Page 5

   request

       An HTTP request message (as defined in Section 5).

   response

       An HTTP response message (as defined in Section 6).

   resource

       A network data object or service which can be identified by a
       URI (Section 3.2).

   entity

       A particular representation or rendition of a data resource, or
       reply from a service resource, that may be enclosed within a
       request or response message. An entity consists of
       metainformation in the form of entity headers and content in the
       form of an entity body.

   client

       An application program that establishes connections for the
       purpose of sending requests.

   user agent

       The client which initiates a request. These are often browsers,
       editors, spiders (web-traversing robots), or other end user
       tools.

   server

       An application program that accepts connections in order to
       service requests by sending back responses.

   origin server

       The server on which a given resource resides or is to be created.

   proxy

       An intermediary program which acts as both a server and a client
       for the purpose of making requests on behalf of other clients.
       Requests are serviced internally or by passing them, with
       possible translation, on to other servers. A proxy must
       interpret and, if necessary, rewrite a request message before

RFC1945 - Page 6

       forwarding it. Proxies are often used as client-side portals
       through network firewalls and as helper applications for
       handling requests via protocols not implemented by the user
       agent.

   gateway

       A server which acts as an intermediary for some other server.
       Unlike a proxy, a gateway receives requests as if it were the
       origin server for the requested resource; the requesting client
       may not be aware that it is communicating with a gateway.
       Gateways are often used as server-side portals through network
       firewalls and as protocol translators for access to resources
       stored on non-HTTP systems.

   tunnel

       A tunnel is an intermediary program which is acting as a blind
       relay between two connections. Once active, a tunnel is not
       considered a party to the HTTP communication, though the tunnel
       may have been initiated by an HTTP request. The tunnel ceases to
       exist when both ends of the relayed connections are closed.
       Tunnels are used when a portal is necessary and the intermediary
       cannot, or should not, interpret the relayed communication.

   cache

       A program's local store of response messages and the subsystem
       that controls its message storage, retrieval, and deletion. A
       cache stores cachable responses in order to reduce the response
       time and network bandwidth consumption on future, equivalent
       requests. Any client or server may include a cache, though a
       cache cannot be used by a server while it is acting as a tunnel.

   Any given program may be capable of being both a client and a server;
   our use of these terms refers only to the role being performed by the
   program for a particular connection, rather than to the program's
   capabilities in general. Likewise, any server may act as an origin
   server, proxy, gateway, or tunnel, switching behavior based on the
   nature of each request.

1.3  Overall Operation

   The HTTP protocol is based on a request/response paradigm. A client
   establishes a connection with a server and sends a request to the
   server in the form of a request method, URI, and protocol version,
   followed by a MIME-like message containing request modifiers, client
   information, and possible body content. The server responds with a

RFC1945 - Page 7

   status line, including the message's protocol version and a success
   or error code, followed by a MIME-like message containing server
   information, entity metainformation, and possible body content.

   Most HTTP communication is initiated by a user agent and consists of
   a request to be applied to a resource on some origin server. In the
   simplest case, this may be accomplished via a single connection (v)
   between the user agent (UA) and the origin server (O).

          request chain ------------------------>
       UA -------------------v------------------- O
          <----------------------- response chain

   A more complicated situation occurs when one or more intermediaries
   are present in the request/response chain. There are three common
   forms of intermediary: proxy, gateway, and tunnel. A proxy is a
   forwarding agent, receiving requests for a URI in its absolute form,
   rewriting all or parts of the message, and forwarding the reformatted
   request toward the server identified by the URI. A gateway is a
   receiving agent, acting as a layer above some other server(s) and, if
   necessary, translating the requests to the underlying server's
   protocol. A tunnel acts as a relay point between two connections
   without changing the messages; tunnels are used when the
   communication needs to pass through an intermediary (such as a
   firewall) even when the intermediary cannot understand the contents
   of the messages.

          request chain -------------------------------------->
       UA -----v----- A -----v----- B -----v----- C -----v----- O
          <------------------------------------- response chain

   The figure above shows three intermediaries (A, B, and C) between the
   user agent and origin server. A request or response message that
   travels the whole chain must pass through four separate connections.
   This distinction is important because some HTTP communication options
   may apply only to the connection with the nearest, non-tunnel
   neighbor, only to the end-points of the chain, or to all connections
   along the chain. Although the diagram is linear, each participant may
   be engaged in multiple, simultaneous communications. For example, B
   may be receiving requests from many clients other than A, and/or
   forwarding requests to servers other than C, at the same time that it
   is handling A's request.

   Any party to the communication which is not acting as a tunnel may
   employ an internal cache for handling requests. The effect of a cache
   is that the request/response chain is shortened if one of the
   participants along the chain has a cached response applicable to that
   request. The following illustrates the resulting chain if B has a

RFC1945 - Page 8

   cached copy of an earlier response from O (via C) for a request which
   has not been cached by UA or A.

          request chain ---------->
       UA -----v----- A -----v----- B - - - - - - C - - - - - - O
          <--------- response chain

   Not all responses are cachable, and some requests may contain
   modifiers which place special requirements on cache behavior. Some
   HTTP/1.0 applications use heuristics to describe what is or is not a
   "cachable" response, but these rules are not standardized.

   On the Internet, HTTP communication generally takes place over TCP/IP
   connections. The default port is TCP 80 [15], but other ports can be
   used. This does not preclude HTTP from being implemented on top of
   any other protocol on the Internet, or on other networks. HTTP only
   presumes a reliable transport; any protocol that provides such
   guarantees can be used, and the mapping of the HTTP/1.0 request and
   response structures onto the transport data units of the protocol in
   question is outside the scope of this specification.

   Except for experimental applications, current practice requires that
   the connection be established by the client prior to each request and
   closed by the server after sending the response. Both clients and
   servers should be aware that either party may close the connection
   prematurely, due to user action, automated time-out, or program
   failure, and should handle such closing in a predictable fashion. In
   any case, the closing of the connection by either or both parties
   always terminates the current request, regardless of its status.

1.4  HTTP and MIME

   HTTP/1.0 uses many of the constructs defined for MIME, as defined in
   RFC 1521 [5]. Appendix C describes the ways in which the context of
   HTTP allows for different use of Internet Media Types than is
   typically found in Internet mail, and gives the rationale for those
   differences.

2.  Notational Conventions and Generic Grammar

2.1  Augmented BNF

   All of the mechanisms specified in this document are described in
   both prose and an augmented Backus-Naur Form (BNF) similar to that
   used by RFC 822 [7]. Implementors will need to be familiar with the
   notation in order to understand this specification. The augmented BNF
   includes the following constructs:

RFC1945 - Page 9

   name = definition

       The name of a rule is simply the name itself (without any
       enclosing "<" and ">") and is separated from its definition by
       the equal character "=". Whitespace is only significant in that
       indentation of continuation lines is used to indicate a rule
       definition that spans more than one line. Certain basic rules
       are in uppercase, such as SP, LWS, HT, CRLF, DIGIT, ALPHA, etc.
       Angle brackets are used within definitions whenever their
       presence will facilitate discerning the use of rule names.

   "literal"

       Quotation marks surround literal text. Unless stated otherwise,
       the text is case-insensitive.

   rule1 | rule2

       Elements separated by a bar ("I") are alternatives,
       e.g., "yes | no" will accept yes or no.

   (rule1 rule2)

       Elements enclosed in parentheses are treated as a single
       element. Thus, "(elem (foo | bar) elem)" allows the token
       sequences "elem foo elem" and "elem bar elem".

   *rule

       The character "*" preceding an element indicates repetition. The
       full form is "<n>*<m>element" indicating at least <n> and at
       most <m> occurrences of element. Default values are 0 and
       infinity so that "*(element)" allows any number, including zero;
       "1*element" requires at least one; and "1*2element" allows one
       or two.

   [rule]

       Square brackets enclose optional elements; "[foo bar]" is
       equivalent to "*1(foo bar)".

   N rule

       Specific repetition: "<n>(element)" is equivalent to
       "<n>*<n>(element)"; that is, exactly <n> occurrences of
       (element). Thus 2DIGIT is a 2-digit number, and 3ALPHA is a
       string of three alphabetic characters.

RFC1945 - Page 10

   #rule

       A construct "#" is defined, similar to "*", for defining lists
       of elements. The full form is "<n>#<m>element" indicating at
       least <n> and at most <m> elements, each separated by one or
       more commas (",") and optional linear whitespace (LWS). This
       makes the usual form of lists very easy; a rule such as
       "( *LWS element *( *LWS "," *LWS element ))" can be shown as
       "1#element". Wherever this construct is used, null elements are
       allowed, but do not contribute to the count of elements present.
       That is, "(element), , (element)" is permitted, but counts as
       only two elements. Therefore, where at least one element is
       required, at least one non-null element must be present. Default
       values are 0 and infinity so that "#(element)" allows any
       number, including zero; "1#element" requires at least one; and
       "1#2element" allows one or two.

   ; comment

       A semi-colon, set off some distance to the right of rule text,
       starts a comment that continues to the end of line. This is a
       simple way of including useful notes in parallel with the
       specifications.

   implied *LWS

       The grammar described by this specification is word-based.
       Except where noted otherwise, linear whitespace (LWS) can be
       included between any two adjacent words (token or
       quoted-string), and between adjacent tokens and delimiters
       (tspecials), without changing the interpretation of a field. At
       least one delimiter (tspecials) must exist between any two
       tokens, since they would otherwise be interpreted as a single
       token. However, applications should attempt to follow "common
       form" when generating HTTP constructs, since there exist some
       implementations that fail to accept anything beyond the common
       forms.

2.2  Basic Rules

   The following rules are used throughout this specification to
   describe basic parsing constructs. The US-ASCII coded character set
   is defined by [17].

       OCTET          = <any 8-bit sequence of data>
       CHAR           = <any US-ASCII character (octets 0 - 127)>
       UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
       LOALPHA        = <any US-ASCII lowercase letter "a".."z">

RFC1945 - Page 11

       ALPHA          = UPALPHA | LOALPHA
       DIGIT          = <any US-ASCII digit "0".."9">
       CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>
       CR             = <US-ASCII CR, carriage return (13)>
       LF             = <US-ASCII LF, linefeed (10)>
       SP             = <US-ASCII SP, space (32)>
       HT             = <US-ASCII HT, horizontal-tab (9)>
       <">            = <US-ASCII double-quote mark (34)>

   HTTP/1.0 defines the octet sequence CR LF as the end-of-line marker
   for all protocol elements except the Entity-Body (see Appendix B for
   tolerant applications). The end-of-line marker within an Entity-Body
   is defined by its associated media type, as described in Section 3.6.

       CRLF           = CR LF

   HTTP/1.0 headers may be folded onto multiple lines if each
   continuation line begins with a space or horizontal tab. All linear
   whitespace, including folding, has the same semantics as SP.

       LWS            = [CRLF] 1*( SP | HT )

   However, folding of header lines is not expected by some
   applications, and should not be generated by HTTP/1.0 applications.

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words
   of *TEXT may contain octets from character sets other than US-ASCII.

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

   Recipients of header field TEXT containing octets outside the US-
   ASCII character set may assume that they represent ISO-8859-1
   characters.

   Hexadecimal numeric characters are used in several protocol elements.

       HEX            = "A" | "B" | "C" | "D" | "E" | "F"
                      | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT

   Many HTTP/1.0 header field values consist of words separated by LWS
   or special characters. These special characters must be in a quoted
   string to be used within a parameter value.

       word           = token | quoted-string

RFC1945 - Page 12

       token          = 1*<any CHAR except CTLs or tspecials>

       tspecials      = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT

   Comments may be included in some HTTP header fields by surrounding
   the comment text with parentheses. Comments are only allowed in
   fields containing "comment" as part of their field value definition.
   In all other fields, parentheses are considered part of the field
   value.

       comment        = "(" *( ctext | comment ) ")"
       ctext          = <any TEXT excluding "(" and ")">

   A string of text is parsed as a single word if it is quoted using
   double-quote marks.

       quoted-string  = ( <"> *(qdtext) <"> )

       qdtext         = <any CHAR except <"> and CTLs,
                        but including LWS>

   Single-character quoting using the backslash ("\") character is not
   permitted in HTTP/1.0.

3.  Protocol Parameters

3.1  HTTP Version

   HTTP uses a "<major>.<minor>" numbering scheme to indicate versions
   of the protocol. The protocol versioning policy is intended to allow
   the sender to indicate the format of a message and its capacity for
   understanding further HTTP communication, rather than the features
   obtained via that communication. No change is made to the version
   number for the addition of message components which do not affect
   communication behavior or which only add to extensible field values.
   The <minor> number is incremented when the changes made to the
   protocol add features which do not change the general message parsing
   algorithm, but which may add to the message semantics and imply
   additional capabilities of the sender. The <major> number is
   incremented when the format of a message within the protocol is
   changed.

   The version of an HTTP message is indicated by an HTTP-Version field
   in the first line of the message. If the protocol version is not
   specified, the recipient must assume that the message is in the

RFC1945 - Page 13

   simple HTTP/0.9 format.

       HTTP-Version   = "HTTP" "/" 1*DIGIT "." 1*DIGIT

   Note that the major and minor numbers should be treated as separate
   integers and that each may be incremented higher than a single digit.
   Thus, HTTP/2.4 is a lower version than HTTP/2.13, which in turn is
   lower than HTTP/12.3. Leading zeros should be ignored by recipients
   and never generated by senders.

   This document defines both the 0.9 and 1.0 versions of the HTTP
   protocol. Applications sending Full-Request or Full-Response
   messages, as defined by this specification, must include an HTTP-
   Version of "HTTP/1.0".

   HTTP/1.0 servers must:

      o recognize the format of the Request-Line for HTTP/0.9 and
        HTTP/1.0 requests;

      o understand any valid request in the format of HTTP/0.9 or
        HTTP/1.0;

      o respond appropriately with a message in the same protocol
        version used by the client.

   HTTP/1.0 clients must:

      o recognize the format of the Status-Line for HTTP/1.0 responses;

      o understand any valid response in the format of HTTP/0.9 or
        HTTP/1.0.

   Proxy and gateway applications must be careful in forwarding requests
   that are received in a format different than that of the
   application's native HTTP version. Since the protocol version
   indicates the protocol capability of the sender, a proxy/gateway must
   never send a message with a version indicator which is greater than
   its native version; if a higher version request is received, the
   proxy/gateway must either downgrade the request version or respond
   with an error. Requests with a version lower than that of the
   application's native format may be upgraded before being forwarded;
   the proxy/gateway's response to that request must follow the server
   requirements listed above.

RFC1945 - Page 14

3.2  Uniform Resource Identifiers

   URIs have been known by many names: WWW addresses, Universal Document
   Identifiers, Universal Resource Identifiers [2], and finally the
   combination of Uniform Resource Locators (URL) [4] and Names (URN)
   [16]. As far as HTTP is concerned, Uniform Resource Identifiers are
   simply formatted strings which identify--via name, location, or any
   other characteristic--a network resource.

3.2.1 General Syntax

   URIs in HTTP can be represented in absolute form or relative to some
   known base URI [9], depending upon the context of their use. The two
   forms are differentiated by the fact that absolute URIs always begin
   with a scheme name followed by a colon.

       URI            = ( absoluteURI | relativeURI ) [ "#" fragment ]

       absoluteURI    = scheme ":" *( uchar | reserved )

       relativeURI    = net_path | abs_path | rel_path

       net_path       = "//" net_loc [ abs_path ]
       abs_path       = "/" rel_path
       rel_path       = [ path ] [ ";" params ] [ "?" query ]

       path           = fsegment *( "/" segment )
       fsegment       = 1*pchar
       segment        = *pchar

       params         = param *( ";" param )
       param          = *( pchar | "/" )

       scheme         = 1*( ALPHA | DIGIT | "+" | "-" | "." )
       net_loc        = *( pchar | ";" | "?" )
       query          = *( uchar | reserved )
       fragment       = *( uchar | reserved )

       pchar          = uchar | ":" | "@" | "&" | "=" | "+"
       uchar          = unreserved | escape
       unreserved     = ALPHA | DIGIT | safe | extra | national

       escape         = "%" HEX HEX
       reserved       = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
       extra          = "!" | "*" | "'" | "(" | ")" | ","
       safe           = "$" | "-" | "_" | "."
       unsafe         = CTL | SP | <"> | "#" | "%" | "<" | ">"
       national       = <any OCTET excluding ALPHA, DIGIT,

RFC1945 - Page 15

                        reserved, extra, safe, and unsafe>

   For definitive information on URL syntax and semantics, see RFC 1738
   [4] and RFC 1808 [9]. The BNF above includes national characters not
   allowed in valid URLs as specified by RFC 1738, since HTTP servers
   are not restricted in the set of unreserved characters allowed to
   represent the rel_path part of addresses, and HTTP proxies may
   receive requests for URIs not defined by RFC 1738.

3.2.2 http URL

   The "http" scheme is used to locate network resources via the HTTP
   protocol. This section defines the scheme-specific syntax and
   semantics for http URLs.

       http_URL       = "http:" "//" host [ ":" port ] [ abs_path ]

       host           = <A legal Internet host domain name
                         or IP address (in dotted-decimal form),
                         as defined by Section 2.1 of RFC 1123>

       port           = *DIGIT

   If the port is empty or not given, port 80 is assumed. The semantics
   are that the identified resource is located at the server listening
   for TCP connections on that port of that host, and the Request-URI
   for the resource is abs_path. If the abs_path is not present in the
   URL, it must be given as "/" when used as a Request-URI (Section
   5.1.2).

      Note: Although the HTTP protocol is independent of the transport
      layer protocol, the http URL only identifies resources by their
      TCP location, and thus non-TCP resources must be identified by
      some other URI scheme.

   The canonical form for "http" URLs is obtained by converting any
   UPALPHA characters in host to their LOALPHA equivalent (hostnames are
   case-insensitive), eliding the [ ":" port ] if the port is 80, and
   replacing an empty abs_path with "/".

3.3  Date/Time Formats

   HTTP/1.0 applications have historically allowed three different
   formats for the representation of date/time stamps:

       Sun, 06 Nov 1994 08:49:37 GMT    ; RFC 822, updated by RFC 1123
       Sunday, 06-Nov-94 08:49:37 GMT   ; RFC 850, obsoleted by RFC 1036
       Sun Nov  6 08:49:37 1994         ; ANSI C's asctime() format

RFC1945 - Page 16

   The first format is preferred as an Internet standard and represents
   a fixed-length subset of that defined by RFC 1123 [6] (an update to
   RFC 822 [7]). The second format is in common use, but is based on the
   obsolete RFC 850 [10] date format and lacks a four-digit year.
   HTTP/1.0 clients and servers that parse the date value should accept
   all three formats, though they must never generate the third
   (asctime) format.

      Note: Recipients of date values are encouraged to be robust in
      accepting date values that may have been generated by non-HTTP
      applications, as is sometimes the case when retrieving or posting
      messages via proxies/gateways to SMTP or NNTP.

   All HTTP/1.0 date/time stamps must be represented in Universal Time
   (UT), also known as Greenwich Mean Time (GMT), without exception.
   This is indicated in the first two formats by the inclusion of "GMT"
   as the three-letter abbreviation for time zone, and should be assumed
   when reading the asctime format.

       HTTP-date      = rfc1123-date | rfc850-date | asctime-date

       rfc1123-date   = wkday "," SP date1 SP time SP "GMT"
       rfc850-date    = weekday "," SP date2 SP time SP "GMT"
       asctime-date   = wkday SP date3 SP time SP 4DIGIT

       date1          = 2DIGIT SP month SP 4DIGIT
                        ; day month year (e.g., 02 Jun 1982)
       date2          = 2DIGIT "-" month "-" 2DIGIT
                        ; day-month-year (e.g., 02-Jun-82)
       date3          = month SP ( 2DIGIT | ( SP 1DIGIT ))
                        ; month day (e.g., Jun  2)

       time           = 2DIGIT ":" 2DIGIT ":" 2DIGIT
                        ; 00:00:00 - 23:59:59

       wkday          = "Mon" | "Tue" | "Wed"
                      | "Thu" | "Fri" | "Sat" | "Sun"

       weekday        = "Monday" | "Tuesday" | "Wednesday"
                      | "Thursday" | "Friday" | "Saturday" | "Sunday"

       month          = "Jan" | "Feb" | "Mar" | "Apr"
                      | "May" | "Jun" | "Jul" | "Aug"
                      | "Sep" | "Oct" | "Nov" | "Dec"

       Note: HTTP requirements for the date/time stamp format apply
       only to their usage within the protocol stream. Clients and
       servers are not required to use these formats for user

RFC1945 - Page 17

       presentation, request logging, etc.

3.4  Character Sets

   HTTP uses the same definition of the term "character set" as that
   described for MIME:

      The term "character set" is used in this document to refer to a
      method used with one or more tables to convert a sequence of
      octets into a sequence of characters. Note that unconditional
      conversion in the other direction is not required, in that not all
      characters may be available in a given character set and a
      character set may provide more than one sequence of octets to
      represent a particular character. This definition is intended to
      allow various kinds of character encodings, from simple single-
      table mappings such as US-ASCII to complex table switching methods
      such as those that use ISO 2022's techniques. However, the
      definition associated with a MIME character set name must fully
      specify the mapping to be performed from octets to characters. In
      particular, use of external profiling information to determine the
      exact mapping is not permitted.

      Note: This use of the term "character set" is more commonly
      referred to as a "character encoding." However, since HTTP and
      MIME share the same registry, it is important that the terminology
      also be shared.

   HTTP character sets are identified by case-insensitive tokens. The
   complete set of tokens are defined by the IANA Character Set registry
   [15]. However, because that registry does not define a single,
   consistent token for each character set, we define here the preferred
   names for those character sets most likely to be used with HTTP
   entities. These character sets include those registered by RFC 1521
   [5] -- the US-ASCII [17] and ISO-8859 [18] character sets -- and
   other names specifically recommended for use within MIME charset
   parameters.

     charset = "US-ASCII"
             | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
             | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
             | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
             | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
             | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
             | token

   Although HTTP allows an arbitrary token to be used as a charset
   value, any token that has a predefined value within the IANA
   Character Set registry [15] must represent the character set defined

RFC1945 - Page 18

   by that registry. Applications should limit their use of character
   sets to those defined by the IANA registry.

   The character set of an entity body should be labelled as the lowest
   common denominator of the character codes used within that body, with
   the exception that no label is preferred over the labels US-ASCII or
   ISO-8859-1.

3.5  Content Codings

   Content coding values are used to indicate an encoding transformation
   that has been applied to a resource. Content codings are primarily
   used to allow a document to be compressed or encrypted without losing
   the identity of its underlying media type. Typically, the resource is
   stored in this encoding and only decoded before rendering or
   analogous usage.

       content-coding = "x-gzip" | "x-compress" | token

       Note: For future compatibility, HTTP/1.0 applications should
       consider "gzip" and "compress" to be equivalent to "x-gzip"
       and "x-compress", respectively.

   All content-coding values are case-insensitive. HTTP/1.0 uses
   content-coding values in the Content-Encoding (Section 10.3) header
   field. Although the value describes the content-coding, what is more
   important is that it indicates what decoding mechanism will be
   required to remove the encoding. Note that a single program may be
   capable of decoding multiple content-coding formats. Two values are
   defined by this specification:

   x-gzip
       An encoding format produced by the file compression program
       "gzip" (GNU zip) developed by Jean-loup Gailly. This format is
       typically a Lempel-Ziv coding (LZ77) with a 32 bit CRC.

   x-compress
       The encoding format produced by the file compression program
       "compress". This format is an adaptive Lempel-Ziv-Welch coding
       (LZW).

       Note: Use of program names for the identification of
       encoding formats is not desirable and should be discouraged
       for future encodings. Their use here is representative of
       historical practice, not good design.

RFC1945 - Page 19

3.6  Media Types

   HTTP uses Internet Media Types [13] in the Content-Type header field
   (Section 10.5) in order to provide open and extensible data typing.

       media-type     = type "/" subtype *( ";" parameter )
       type           = token
       subtype        = token

   Parameters may follow the type/subtype in the form of attribute/value
   pairs.

       parameter      = attribute "=" value
       attribute      = token
       value          = token | quoted-string

   The type, subtype, and parameter attribute names are case-
   insensitive. Parameter values may or may not be case-sensitive,
   depending on the semantics of the parameter name. LWS must not be
   generated between the type and subtype, nor between an attribute and
   its value. Upon receipt of a media type with an unrecognized
   parameter, a user agent should treat the media type as if the
   unrecognized parameter and its value were not present.

   Some older HTTP applications do not recognize media type parameters.
   HTTP/1.0 applications should only use media type parameters when they
   are necessary to define the content of a message.

   Media-type values are registered with the Internet Assigned Number
   Authority (IANA [15]). The media type registration process is
   outlined in RFC 1590 [13]. Use of non-registered media types is
   discouraged.

3.6.1 Canonicalization and Text Defaults

   Internet media types are registered with a canonical form. In
   general, an Entity-Body transferred via HTTP must be represented in
   the appropriate canonical form prior to its transmission. If the body
   has been encoded with a Content-Encoding, the underlying data should
   be in canonical form prior to being encoded.

   Media subtypes of the "text" type use CRLF as the text line break
   when in canonical form. However, HTTP allows the transport of text
   media with plain CR or LF alone representing a line break when used
   consistently within the Entity-Body. HTTP applications must accept
   CRLF, bare CR, and bare LF as being representative of a line break in
   text media received via HTTP.

RFC1945 - Page 20

   In addition, if the text media is represented in a character set that
   does not use octets 13 and 10 for CR and LF respectively, as is the
   case for some multi-byte character sets, HTTP allows the use of
   whatever octet sequences are defined by that character set to
   represent the equivalent of CR and LF for line breaks. This
   flexibility regarding line breaks applies only to text media in the
   Entity-Body; a bare CR or LF should not be substituted for CRLF
   within any of the HTTP control structures (such as header fields and
   multipart boundaries).

   The "charset" parameter is used with some media types to define the
   character set (Section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets must be labelled with an appropriate charset value in
   order to be consistently interpreted by the recipient.

      Note: Many current HTTP servers provide data using charsets other
      than "ISO-8859-1" without proper labelling. This situation reduces
      interoperability and is not recommended. To compensate for this,
      some HTTP user agents provide a configuration option to allow the
      user to change the default interpretation of the media type
      character set when no charset parameter is given.

3.6.2 Multipart Types

   MIME provides for a number of "multipart" types -- encapsulations of
   several entities within a single message's Entity-Body. The multipart
   types registered by IANA [15] do not have any special meaning for
   HTTP/1.0, though user agents may need to understand each type in
   order to correctly interpret the purpose of each body-part. An HTTP
   user agent should follow the same or similar behavior as a MIME user
   agent does upon receipt of a multipart type. HTTP servers should not
   assume that all HTTP clients are prepared to handle multipart types.

   All multipart types share a common syntax and must include a boundary
   parameter as part of the media type value. The message body is itself
   a protocol element and must therefore use only CRLF to represent line
   breaks between body-parts. Multipart body-parts may contain HTTP
   header fields which are significant to the meaning of that part.

3.7  Product Tokens

   Product tokens are used to allow communicating applications to
   identify themselves via a simple product token, with an optional
   slash and version designator. Most fields using product tokens also
   allow subproducts which form a significant part of the application to

RFC1945 - Page 21

   be listed, separated by whitespace. By convention, the products are
   listed in order of their significance for identifying the
   application.

       product         = token ["/" product-version]
       product-version = token

   Examples:

       User-Agent: CERN-LineMode/2.15 libwww/2.17b3

       Server: Apache/0.8.4

   Product tokens should be short and to the point -- use of them for
   advertizing or other non-essential information is explicitly
   forbidden. Although any token character may appear in a product-
   version, this token should only be used for a version identifier
   (i.e., successive versions of the same product should only differ in
   the product-version portion of the product value).

(page 21 continued on part 2)