RFC 2068

Hypertext Transfer Protocol -- HTTP/1.1

Pages: 162
Obsoleted by: 2616

Part 6 of 6 – Pages 139 to 162

noToC RFC2068 - Page 139 prevText

15 Security Considerations

   This section is meant to inform application developers, information
   providers, and users of the security limitations in HTTP/1.1 as
   described by this document. The discussion does not include
   definitive solutions to the problems revealed, though it does make
   some suggestions for reducing security risks.

15.1 Authentication of Clients

   The Basic authentication scheme is not a secure method of user
   authentication, nor does it in any way protect the entity, which is
   transmitted in clear text across the physical network used as the
   carrier. HTTP does not prevent additional authentication schemes and
   encryption mechanisms from being employed to increase security or the
   addition of enhancements (such as schemes to use one-time passwords)
   to Basic authentication.

noToC RFC2068 - Page 140

   The most serious flaw in Basic authentication is that it results in
   the essentially clear text transmission of the user's password over
   the physical network. It is this problem which Digest Authentication
   attempts to address.

   Because Basic authentication involves the clear text transmission of
   passwords it SHOULD never be used (without enhancements) to protect
   sensitive or valuable information.

   A common use of Basic authentication is for identification purposes
   -- requiring the user to provide a user name and password as a means
   of identification, for example, for purposes of gathering accurate
   usage statistics on a server. When used in this way it is tempting to
   think that there is no danger in its use if illicit access to the
   protected documents is not a major concern. This is only correct if
   the server issues both user name and password to the users and in
   particular does not allow the user to choose his or her own password.
   The danger arises because naive users frequently reuse a single
   password to avoid the task of maintaining multiple passwords.

   If a server permits users to select their own passwords, then the
   threat is not only illicit access to documents on the server but also
   illicit access to the accounts of all users who have chosen to use
   their account password. If users are allowed to choose their own
   password that also means the server must maintain files containing
   the (presumably encrypted) passwords. Many of these may be the
   account passwords of users perhaps at distant sites. The owner or
   administrator of such a system could conceivably incur liability if
   this information is not maintained in a secure fashion.

   Basic Authentication is also vulnerable to spoofing by counterfeit
   servers. If a user can be led to believe that he is connecting to a
   host containing information protected by basic authentication when in
   fact he is connecting to a hostile server or gateway then the
   attacker can request a password, store it for later use, and feign an
   error. This type of attack is not possible with Digest Authentication
   [32]. Server implementers SHOULD guard against the possibility of
   this sort of counterfeiting by gateways or CGI scripts. In particular
   it is very dangerous for a server to simply turn over a connection to
   a gateway since that gateway can then use the persistent connection
   mechanism to engage in multiple transactions with the client while
   impersonating the original server in a way that is not detectable by
   the client.

15.2 Offering a Choice of Authentication Schemes

   An HTTP/1.1 server may return multiple challenges with a 401
   (Authenticate) response, and each challenge may use a different

noToC RFC2068 - Page 141

   scheme.  The order of the challenges returned to the user agent is in
   the order that the server would prefer they be chosen. The server
   should order its challenges with the "most secure" authentication
   scheme first. A user agent should choose as the challenge to be made
   to the user the first one that the user agent understands.

   When the server offers choices of authentication schemes using the
   WWW-Authenticate header, the "security" of the authentication is only
   as malicious user could capture the set of challenges and try to
   authenticate him/herself using the weakest of the authentication
   schemes. Thus, the ordering serves more to protect the user's
   credentials than the server's information.

   A possible man-in-the-middle (MITM) attack would be to add a weak
   authentication scheme to the set of choices, hoping that the client
   will use one that exposes the user's credentials (e.g. password). For
   this reason, the client should always use the strongest scheme that
   it understands from the choices accepted.

   An even better MITM attack would be to remove all offered choices,
   and to insert a challenge that requests Basic authentication. For
   this reason, user agents that are concerned about this kind of attack
   could remember the strongest authentication scheme ever requested by
   a server and produce a warning message that requires user
   confirmation before using a weaker one. A particularly insidious way
   to mount such a MITM attack would be to offer a "free" proxy caching
   service to gullible users.

15.3 Abuse of Server Log Information

   A server is in the position to save personal data about a user's
   requests which may identify their reading patterns or subjects of
   interest. This information is clearly confidential in nature and its
   handling may be constrained by law in certain countries. People using
   the HTTP protocol to provide data are responsible for ensuring that
   such material is not distributed without the permission of any
   individuals that are identifiable by the published results.

15.4 Transfer of Sensitive Information

   Like any generic data transfer protocol, HTTP cannot regulate the
   content of the data that is transferred, nor is there any a priori
   method of determining the sensitivity of any particular piece of
   information within the context of any given request. Therefore,
   applications SHOULD supply as much control over this information as
   possible to the provider of that information. Four header fields are
   worth special mention in this context: Server, Via, Referer and From.

noToC RFC2068 - Page 142

   Revealing the specific software version of the server may allow the
   server machine to become more vulnerable to attacks against software
   that is known to contain security holes. Implementers SHOULD make the
   Server header field a configurable option.

   Proxies which serve as a portal through a network firewall SHOULD
   take special precautions regarding the transfer of header information
   that identifies the hosts behind the firewall. In particular, they
   SHOULD remove, or replace with sanitized versions, any Via fields
   generated behind the firewall.

   The Referer field allows reading patterns to be studied and reverse
   links drawn. Although it can be very useful, its power can be abused
   if user details are not separated from the information contained in
   the Referer. Even when the personal information has been removed, the
   Referer field may indicate a private document's URI whose publication
   would be inappropriate.

   The information sent in the From field might conflict with the user's
   privacy interests or their site's security policy, and hence it
   SHOULD NOT be transmitted without the user being able to disable,
   enable, and modify the contents of the field. The user MUST be able
   to set the contents of this field within a user preference or
   application defaults configuration.

   We suggest, though do not require, that a convenient toggle interface
   be provided for the user to enable or disable the sending of From and
   Referer information.

15.5 Attacks Based On File and Path Names

   Implementations of HTTP origin servers SHOULD be careful to restrict
   the documents returned by HTTP requests to be only those that were
   intended by the server administrators. If an HTTP server translates
   HTTP URIs directly into file system calls, the server MUST take
   special care not to serve files that were not intended to be
   delivered to HTTP clients.  For example, UNIX, Microsoft Windows, and
   other operating systems use ".." as a path component to indicate a
   directory level above the current one. On such a system, an HTTP
   server MUST disallow any such construct in the Request-URI if it
   would otherwise allow access to a resource outside those intended to
   be accessible via the HTTP server. Similarly, files intended for
   reference only internally to the server (such as access control
   files, configuration files, and script code) MUST be protected from
   inappropriate retrieval, since they might contain sensitive
   information. Experience has shown that minor bugs in such HTTP server
   implementations have turned into security risks.

noToC RFC2068 - Page 143

15.6 Personal Information

   HTTP clients are often privy to large amounts of personal information
   (e.g. the user's name, location, mail address, passwords, encryption
   keys, etc.), and SHOULD be very careful to prevent unintentional
   leakage of this information via the HTTP protocol to other sources.
   We very strongly recommend that a convenient interface be provided
   for the user to control dissemination of such information, and that
   designers and implementers be particularly careful in this area.
   History shows that errors in this area are often both serious
   security and/or privacy problems, and often generate highly adverse
   publicity for the implementer's company.

15.7 Privacy Issues Connected to Accept Headers

   Accept request-headers can reveal information about the user to all
   servers which are accessed. The Accept-Language header in particular
   can reveal information the user would consider to be of a private
   nature, because the understanding of particular languages is often
   strongly correlated to the membership of a particular ethnic group.
   User agents which offer the option to configure the contents of an
   Accept-Language header to be sent in every request are strongly
   encouraged to let the configuration process include a message which
   makes the user aware of the loss of privacy involved.

   An approach that limits the loss of privacy would be for a user agent
   to omit the sending of Accept-Language headers by default, and to ask
   the user whether it should start sending Accept-Language headers to a
   server if it detects, by looking for any Vary response-header fields
   generated by the server, that such sending could improve the quality
   of service.

   Elaborate user-customized accept header fields sent in every request,
   in particular if these include quality values, can be used by servers
   as relatively reliable and long-lived user identifiers. Such user
   identifiers would allow content providers to do click-trail tracking,
   and would allow collaborating content providers to match cross-server
   click-trails or form submissions of individual users. Note that for
   many users not behind a proxy, the network address of the host
   running the user agent will also serve as a long-lived user
   identifier. In environments where proxies are used to enhance
   privacy, user agents should be conservative in offering accept header
   configuration options to end users. As an extreme privacy measure,
   proxies could filter the accept headers in relayed requests. General
   purpose user agents which provide a high degree of header
   configurability should warn users about the loss of privacy which can
   be involved.

noToC RFC2068 - Page 144

15.8 DNS Spoofing

   Clients using HTTP rely heavily on the Domain Name Service, and are
   thus generally prone to security attacks based on the deliberate
   mis-association of IP addresses and DNS names. Clients need to be
   cautious in assuming the continuing validity of an IP number/DNS name
   association.

   In particular, HTTP clients SHOULD rely on their name resolver for
   confirmation of an IP number/DNS name association, rather than
   caching the result of previous host name lookups. Many platforms
   already can cache host name lookups locally when appropriate, and
   they SHOULD be configured to do so. These lookups should be cached,
   however, only when the TTL (Time To Live) information reported by the
   name server makes it likely that the cached information will remain
   useful.

   If HTTP clients cache the results of host name lookups in order to
   achieve a performance improvement, they MUST observe the TTL
   information reported by DNS.

   If HTTP clients do not observe this rule, they could be spoofed when
   a previously-accessed server's IP address changes. As network
   renumbering is expected to become increasingly common, the
   possibility of this form of attack will grow. Observing this
   requirement thus reduces this potential security vulnerability.

   This requirement also improves the load-balancing behavior of clients
   for replicated servers using the same DNS name and reduces the
   likelihood of a user's experiencing failure in accessing sites which
   use that strategy.

15.9 Location Headers and Spoofing

   If a single server supports multiple organizations that do not trust
   one another, then it must check the values of Location and Content-
   Location headers in responses that are generated under control of
   said organizations to make sure that they do not attempt to
   invalidate resources over which they have no authority.

16 Acknowledgments

   This specification makes heavy use of the augmented BNF and generic
   constructs defined by David H. Crocker for RFC 822. Similarly, it
   reuses many of the definitions provided by Nathaniel Borenstein and
   Ned Freed for MIME. We hope that their inclusion in this
   specification will help reduce past confusion over the relationship
   between HTTP and Internet mail message formats.

noToC RFC2068 - Page 145

   The HTTP protocol has evolved considerably over the past four years.
   It has benefited from a large and active developer community--the
   many people who have participated on the www-talk mailing list--and
   it is that community which has been most responsible for the success
   of HTTP and of the World-Wide Web in general. Marc Andreessen, Robert
   Cailliau, Daniel W. Connolly, Bob Denny, John Franks, Jean-Francois
   Groff, Phillip M. Hallam-Baker, Hakon W. Lie, Ari Luotonen, Rob
   McCool, Lou Montulli, Dave Raggett, Tony Sanders, and Marc
   VanHeyningen deserve special recognition for their efforts in
   defining early aspects of the protocol.

   This document has benefited greatly from the comments of all those
   participating in the HTTP-WG. In addition to those already mentioned,
   the following individuals have contributed to this specification:

          Gary Adams                  Albert Lunde
          Harald Tveit Alvestrand     John C. Mallery
          Keith Ball                  Jean-Philippe Martin-Flatin
          Brian Behlendorf            Larry Masinter
          Paul Burchard               Mitra
          Maurizio Codogno            David Morris
          Mike Cowlishaw              Gavin Nicol
          Roman Czyborra              Bill Perry
          Michael A. Dolan            Jeffrey Perry
          David J. Fiander            Scott Powers
          Alan Freier                 Owen Rees
          Marc Hedlund                Luigi Rizzo
          Greg Herlihy                David Robinson
          Koen Holtman                Marc Salomon
          Alex Hopmann                Rich Salz
          Bob Jernigan                Allan M. Schiffman
          Shel Kaphan                 Jim Seidman
          Rohit Khare                 Chuck Shotton
          John Klensin                Eric W. Sink
          Martijn Koster              Simon E. Spero
          Alexei Kosut                Richard N. Taylor
          David M. Kristol            Robert S. Thau
          Daniel LaLiberte            Bill (BearHeart) Weinman
          Ben Laurie                  Francois Yergeau
          Paul J. Leach               Mary Ellen Zurko
          Daniel DuBois

   Much of the content and presentation of the caching design is due to
   suggestions and comments from individuals including: Shel Kaphan,
   Paul Leach, Koen Holtman, David Morris, and Larry Masinter.

noToC RFC2068 - Page 146

   Most of the specification of ranges is based on work originally done
   by Ari Luotonen and John Franks, with additional input from Steve
   Zilles.

   Thanks to the "cave men" of Palo Alto. You know who you are.

   Jim Gettys (the current editor of this document) wishes particularly
   to thank Roy Fielding, the previous editor of this document, along
   with John Klensin, Jeff Mogul, Paul Leach, Dave Kristol, Koen
   Holtman, John Franks, Alex Hopmann, and Larry Masinter for their
   help.

17 References

   [1] Alvestrand, H., "Tags for the identification of languages", RFC
   1766, UNINETT, March 1995.

   [2] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., Torrey,
   D., and B. Alberti. "The Internet Gopher Protocol: (a distributed
   document search and retrieval protocol)", RFC 1436, University of
   Minnesota, March 1993.

   [3] Berners-Lee, T., "Universal Resource Identifiers in WWW", A
   Unifying Syntax for the Expression of Names and Addresses of Objects
   on the Network as used in the World-Wide Web", RFC 1630, CERN, June
   1994.

   [4] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource
   Locators (URL)", RFC 1738, CERN, Xerox PARC, University of Minnesota,
   December 1994.

   [5] Berners-Lee, T., and D. Connolly, "HyperText Markup Language
   Specification - 2.0", RFC 1866, MIT/LCS, November 1995.

   [6] Berners-Lee, T., Fielding, R., and H. Frystyk, "Hypertext
   Transfer Protocol -- HTTP/1.0.", RFC 1945 MIT/LCS, UC Irvine, May
   1996.

   [7] Freed, N., and N. Borenstein, "Multipurpose Internet Mail
   Extensions (MIME) Part One: Format of Internet Message Bodies", RFC
   2045, Innosoft, First Virtual, November 1996.

   [8] Braden, R., "Requirements for Internet hosts - application and
   support", STD 3,  RFC 1123, IETF, October 1989.

   [9] Crocker, D., "Standard for the Format of ARPA Internet Text
   Messages", STD 11, RFC 822, UDEL, August 1982.

noToC RFC2068 - Page 147

   [10] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R.,
   Sui, J., and M. Grinbaum. "WAIS Interface Protocol Prototype
   Functional Specification", (v1.5), Thinking Machines Corporation,
   April 1990.

   [11] Fielding, R., "Relative Uniform Resource Locators", RFC 1808, UC
   Irvine, June 1995.

   [12] Horton, M., and R. Adams. "Standard for interchange of USENET
   messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic
   Studies, December 1987.

   [13] Kantor, B., and P. Lapsley. "Network News Transfer Protocol." A
   Proposed Standard for the Stream-Based Transmission of News", RFC
   977, UC San Diego, UC Berkeley, February 1986.

   [14] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part
   Three: Message Header Extensions for Non-ASCII Text", RFC 2047,
   University of Tennessee, November 1996.

   [15] Nebel, E., and L. Masinter. "Form-based File Upload in HTML",
   RFC 1867, Xerox Corporation, November 1995.

   [16] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821,
   USC/ISI, August 1982.

   [17] Postel, J., "Media Type Registration Procedure", RFC 2048,
   USC/ISI, November 1996.

   [18] Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD
   9, RFC 959, USC/ISI, October 1985.

   [19] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
   1700, USC/ISI, October 1994.

   [20] Sollins, K., and L. Masinter, "Functional Requirements for
   Uniform Resource Names", RFC 1737, MIT/LCS, Xerox Corporation,
   December 1994.

   [21] US-ASCII. Coded Character Set - 7-Bit American Standard Code for
   Information Interchange. Standard ANSI X3.4-1986, ANSI, 1986.

   [22] ISO-8859. International Standard -- Information Processing --
     8-bit Single-Byte Coded Graphic Character Sets --
     Part 1: Latin alphabet No. 1, ISO 8859-1:1987.
     Part 2: Latin alphabet No. 2, ISO 8859-2, 1987.
     Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.
     Part 4: Latin alphabet No. 4, ISO 8859-4, 1988.

noToC RFC2068 - Page 148

     Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988.
     Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987.
     Part 7: Latin/Greek alphabet, ISO 8859-7, 1987.
     Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988.
     Part 9: Latin alphabet No. 5, ISO 8859-9, 1990.

   [23] Meyers, J., and M. Rose "The Content-MD5 Header Field", RFC
   1864, Carnegie Mellon, Dover Beach Consulting, October, 1995.

   [24] Carpenter, B., and Y. Rekhter, "Renumbering Needs Work", RFC
   1900, IAB, February 1996.

   [25] Deutsch, P., "GZIP file format specification version 4.3." RFC
   1952, Aladdin Enterprises, May 1996.

   [26] Venkata N. Padmanabhan and Jeffrey C. Mogul. Improving HTTP
   Latency. Computer Networks and ISDN Systems, v. 28, pp. 25-35, Dec.
   1995.  Slightly revised version of paper in Proc. 2nd International
   WWW Conf. '94: Mosaic and the Web, Oct. 1994, which is available at
   http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/mogul/
   HTTPLatency.html.

   [27] Joe Touch, John Heidemann, and Katia Obraczka, "Analysis of HTTP
   Performance", <URL: http://www.isi.edu/lsam/ib/http-perf/>,
   USC/Information Sciences Institute, June 1996

   [28] Mills, D., "Network Time Protocol, Version 3, Specification,
   Implementation and Analysis", RFC 1305, University of Delaware, March
   1992.

   [29] Deutsch, P., "DEFLATE Compressed Data Format Specification
   version 1.3." RFC 1951, Aladdin Enterprises, May 1996.

   [30] Spero, S., "Analysis of HTTP Performance Problems"
   <URL:http://sunsite.unc.edu/mdma-release/http-prob.html>.

   [31] Deutsch, P., and J-L. Gailly, "ZLIB Compressed Data Format
   Specification version 3.3", RFC 1950, Aladdin Enterprises, Info-ZIP,
   May 1996.

   [32] Franks, J., Hallam-Baker, P., Hostetler, J., Leach, P.,
   Luotonen, A., Sink, E., and L. Stewart, "An Extension to HTTP :
   Digest Access Authentication", RFC 2069, January 1997.

noToC RFC2068 - Page 149

18 Authors' Addresses

   Roy T. Fielding
   Department of Information and Computer Science
   University of California
   Irvine, CA 92717-3425, USA

   Fax: +1 (714) 824-4056
   EMail: fielding@ics.uci.edu


   Jim Gettys
   MIT Laboratory for Computer Science
   545 Technology Square
   Cambridge, MA 02139, USA

   Fax: +1 (617) 258 8682
   EMail: jg@w3.org


   Jeffrey C. Mogul
   Western Research Laboratory
   Digital Equipment Corporation
   250 University Avenue
   Palo Alto, California, 94305, USA

   EMail: mogul@wrl.dec.com


   Henrik Frystyk Nielsen
   W3 Consortium
   MIT Laboratory for Computer Science
   545 Technology Square
   Cambridge, MA 02139, USA

   Fax: +1 (617) 258 8682
   EMail: frystyk@w3.org


   Tim Berners-Lee
   Director, W3 Consortium
   MIT Laboratory for Computer Science
   545 Technology Square
   Cambridge, MA 02139, USA

   Fax: +1 (617) 258 8682
   EMail: timbl@w3.org

noToC RFC2068 - Page 150

19 Appendices

19.1 Internet Media Type message/http

   In addition to defining the HTTP/1.1 protocol, this document serves
   as the specification for the Internet media type "message/http". The
   following is to be registered with IANA.

       Media Type name:         message
       Media subtype name:      http
       Required parameters:     none
       Optional parameters:     version, msgtype

        version: The HTTP-Version number of the enclosed message
                 (e.g., "1.1"). If not present, the version can be
                 determined from the first line of the body.

        msgtype: The message type -- "request" or "response". If not
                 present, the type can be determined from the first
                 line of the body.

       Encoding considerations: only "7bit", "8bit", or "binary" are
                                permitted

       Security considerations: none

19.2 Internet Media Type multipart/byteranges

   When an HTTP message includes the content of multiple ranges (for
   example, a response to a request for multiple non-overlapping
   ranges), these are transmitted as a multipart MIME message. The
   multipart media type for this purpose is called
   "multipart/byteranges".

   The multipart/byteranges media type includes two or more parts, each
   with its own Content-Type and Content-Range fields. The parts are
   separated using a MIME boundary parameter.

          Media Type name:         multipart
          Media subtype name:      byteranges
          Required parameters:     boundary
          Optional parameters:     none

          Encoding considerations: only "7bit", "8bit", or "binary" are
                                   permitted

          Security considerations: none

noToC RFC2068 - Page 151

For example:

   HTTP/1.1 206 Partial content
   Date: Wed, 15 Nov 1995 06:25:24 GMT
   Last-modified: Wed, 15 Nov 1995 04:58:08 GMT
   Content-type: multipart/byteranges; boundary=THIS_STRING_SEPARATES

   --THIS_STRING_SEPARATES
   Content-type: application/pdf
   Content-range: bytes 500-999/8000

   ...the first range...
   --THIS_STRING_SEPARATES
   Content-type: application/pdf
   Content-range: bytes 7000-7999/8000

   ...the second range
   --THIS_STRING_SEPARATES--

19.3 Tolerant Applications

   Although this document specifies the requirements for the generation
   of HTTP/1.1 messages, not all applications will be correct in their
   implementation. We therefore recommend that operational applications
   be tolerant of deviations whenever those deviations can be
   interpreted unambiguously.

   Clients SHOULD be tolerant in parsing the Status-Line and servers
   tolerant when parsing the Request-Line. In particular, they SHOULD
   accept any amount of SP or HT characters between fields, even though
   only a single SP is required.

   The line terminator for message-header fields is the sequence CRLF.
   However, we recommend that applications, when parsing such headers,
   recognize a single LF as a line terminator and ignore the leading CR.

   The character set of an entity-body should be labeled as the lowest
   common denominator of the character codes used within that body, with
   the exception that no label is preferred over the labels US-ASCII or
   ISO-8859-1.

   Additional rules for requirements on parsing and encoding of dates
   and other potential problems with date encodings include:

  o  HTTP/1.1 clients and caches should assume that an RFC-850 date
     which appears to be more than 50 years in the future is in fact
     in the past (this helps solve the "year 2000" problem).

noToC RFC2068 - Page 152

  o  An HTTP/1.1 implementation may internally represent a parsed
     Expires date as earlier than the proper value, but MUST NOT
     internally represent a parsed Expires date as later than the
     proper value.

  o  All expiration-related calculations must be done in GMT. The
     local time zone MUST NOT influence the calculation or comparison
     of an age or expiration time.

  o  If an HTTP header incorrectly carries a date value with a time
     zone other than GMT, it must be converted into GMT using the most
     conservative possible conversion.

19.4 Differences Between HTTP Entities and MIME Entities

   HTTP/1.1 uses many of the constructs defined for Internet Mail (RFC
   822) and the Multipurpose Internet Mail Extensions (MIME ) to allow
   entities to be transmitted in an open variety of representations and
   with extensible mechanisms. However, MIME [7] discusses mail, and
   HTTP has a few features that are different from those described in
   MIME.  These differences were carefully chosen to optimize
   performance over binary connections, to allow greater freedom in the
   use of new media types, to make date comparisons easier, and to
   acknowledge the practice of some early HTTP servers and clients.

   This appendix describes specific areas where HTTP differs from MIME.
   Proxies and gateways to strict MIME environments SHOULD be aware of
   these differences and provide the appropriate conversions where
   necessary. Proxies and gateways from MIME environments to HTTP also
   need to be aware of the differences because some conversions may be
   required.

19.4.1 Conversion to Canonical Form

   MIME requires that an Internet mail entity be converted to canonical
   form prior to being transferred.  Section 3.7.1 of this document
   describes the forms allowed for subtypes of the "text" media type
   when transmitted over HTTP. MIME requires that content with a type of
   "text" represent line breaks as CRLF and forbids the use of CR or LF
   outside of line break sequences.  HTTP allows CRLF, bare CR, and bare
   LF to indicate a line break within text content when a message is
   transmitted over HTTP.

   Where it is possible, a proxy or gateway from HTTP to a strict MIME
   environment SHOULD translate all line breaks within the text media
   types described in section 3.7.1 of this document to the MIME
   canonical form of CRLF. Note, however, that this may be complicated
   by the presence of a Content-Encoding and by the fact that HTTP

noToC RFC2068 - Page 153

   allows the use of some character sets which do not use octets 13 and
   10 to represent CR and LF, as is the case for some multi-byte
   character sets.

19.4.2 Conversion of Date Formats

   HTTP/1.1 uses a restricted set of date formats (section 3.3.1) to
   simplify the process of date comparison. Proxies and gateways from
   other protocols SHOULD ensure that any Date header field present in a
   message conforms to one of the HTTP/1.1 formats and rewrite the date
   if necessary.

19.4.3 Introduction of Content-Encoding

   MIME does not include any concept equivalent to HTTP/1.1's Content-
   Encoding header field. Since this acts as a modifier on the media
   type, proxies and gateways from HTTP to MIME-compliant protocols MUST
   either change the value of the Content-Type header field or decode
   the entity-body before forwarding the message. (Some experimental
   applications of Content-Type for Internet mail have used a media-type
   parameter of ";conversions=<content-coding>" to perform an equivalent
   function as Content-Encoding. However, this parameter is not part of
   MIME.)

19.4.4 No Content-Transfer-Encoding

   HTTP does not use the Content-Transfer-Encoding (CTE) field of MIME.
   Proxies and gateways from MIME-compliant protocols to HTTP MUST
   remove any non-identity CTE ("quoted-printable" or "base64") encoding
   prior to delivering the response message to an HTTP client.

   Proxies and gateways from HTTP to MIME-compliant protocols are
   responsible for ensuring that the message is in the correct format
   and encoding for safe transport on that protocol, where "safe
   transport" is defined by the limitations of the protocol being used.
   Such a proxy or gateway SHOULD label the data with an appropriate
   Content-Transfer-Encoding if doing so will improve the likelihood of
   safe transport over the destination protocol.

19.4.5 HTTP Header Fields in Multipart Body-Parts

   In MIME, most header fields in multipart body-parts are generally
   ignored unless the field name begins with "Content-". In HTTP/1.1,
   multipart body-parts may contain any HTTP header fields which are
   significant to the meaning of that part.

noToC RFC2068 - Page 154

19.4.6 Introduction of Transfer-Encoding

   HTTP/1.1 introduces the Transfer-Encoding header field (section
   14.40).  Proxies/gateways MUST remove any transfer coding prior to
   forwarding a message via a MIME-compliant protocol.

   A process for decoding the "chunked" transfer coding (section 3.6)
   can be represented in pseudo-code as:

          length := 0
          read chunk-size, chunk-ext (if any) and CRLF
          while (chunk-size > 0) {
             read chunk-data and CRLF
             append chunk-data to entity-body
             length := length + chunk-size
             read chunk-size and CRLF
          }
          read entity-header
          while (entity-header not empty) {
             append entity-header to existing header fields
             read entity-header
          }
          Content-Length := length
          Remove "chunked" from Transfer-Encoding

19.4.7 MIME-Version

   HTTP is not a MIME-compliant protocol (see appendix 19.4). However,
   HTTP/1.1 messages may include a single MIME-Version general-header
   field to indicate what version of the MIME protocol was used to
   construct the message. Use of the MIME-Version header field indicates
   that the message is in full compliance with the MIME protocol.
   Proxies/gateways are responsible for ensuring full compliance (where
   possible) when exporting HTTP messages to strict MIME environments.

          MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

   MIME version "1.0" is the default for use in HTTP/1.1. However,
   HTTP/1.1 message parsing and semantics are defined by this document
   and not the MIME specification.

19.5 Changes from HTTP/1.0

   This section summarizes major differences between versions HTTP/1.0
   and HTTP/1.1.

noToC RFC2068 - Page 155

19.5.1 Changes to Simplify Multi-homed Web Servers and Conserve IP
       Addresses

   The requirements that clients and servers support the Host request-
   header, report an error if the Host request-header (section 14.23) is
   missing from an HTTP/1.1 request, and accept absolute URIs (section
   5.1.2) are among the most important changes defined by this
   specification.

   Older HTTP/1.0 clients assumed a one-to-one relationship of IP
   addresses and servers; there was no other established mechanism for
   distinguishing the intended server of a request than the IP address
   to which that request was directed. The changes outlined above will
   allow the Internet, once older HTTP clients are no longer common, to
   support multiple Web sites from a single IP address, greatly
   simplifying large operational Web servers, where allocation of many
   IP addresses to a single host has created serious problems. The
   Internet will also be able to recover the IP addresses that have been
   allocated for the sole purpose of allowing special-purpose domain
   names to be used in root-level HTTP URLs. Given the rate of growth of
   the Web, and the number of servers already deployed, it is extremely
   important that all implementations of HTTP (including updates to
   existing HTTP/1.0 applications) correctly implement these
   requirements:

     o  Both clients and servers MUST support the Host request-header.

     o  Host request-headers are required in HTTP/1.1 requests.

     o  Servers MUST report a 400 (Bad Request) error if an HTTP/1.1
        request does not include a Host request-header.

     o  Servers MUST accept absolute URIs.

noToC RFC2068 - Page 156

19.6 Additional Features

   This appendix documents protocol elements used by some existing HTTP
   implementations, but not consistently and correctly across most
   HTTP/1.1 applications. Implementers should be aware of these
   features, but cannot rely upon their presence in, or interoperability
   with, other HTTP/1.1 applications. Some of these describe proposed
   experimental features, and some describe features that experimental
   deployment found lacking that are now addressed in the base HTTP/1.1
   specification.

19.6.1 Additional Request Methods

19.6.1.1 PATCH

   The PATCH method is similar to PUT except that the entity contains a
   list of differences between the original version of the resource
   identified by the Request-URI and the desired content of the resource
   after the PATCH action has been applied. The list of differences is
   in a format defined by the media type of the entity (e.g.,
   "application/diff") and MUST include sufficient information to allow
   the server to recreate the changes necessary to convert the original
   version of the resource to the desired version.

   If the request passes through a cache and the Request-URI identifies
   a currently cached entity, that entity MUST be removed from the
   cache.  Responses to this method are not cachable.

   The actual method for determining how the patched resource is placed,
   and what happens to its predecessor, is defined entirely by the
   origin server. If the original version of the resource being patched
   included a Content-Version header field, the request entity MUST
   include a Derived-From header field corresponding to the value of the
   original Content-Version header field. Applications are encouraged to
   use these fields for constructing versioning relationships and
   resolving version conflicts.

   PATCH requests must obey the message transmission requirements set
   out in section 8.2.

   Caches that implement PATCH should invalidate cached responses as
   defined in section 13.10 for PUT.

19.6.1.2 LINK

   The LINK method establishes one or more Link relationships between
   the existing resource identified by the Request-URI and other
   existing resources. The difference between LINK and other methods

noToC RFC2068 - Page 157

   allowing links to be established between resources is that the LINK
   method does not allow any message-body to be sent in the request and
   does not directly result in the creation of new resources.

   If the request passes through a cache and the Request-URI identifies
   a currently cached entity, that entity MUST be removed from the
   cache.  Responses to this method are not cachable.

   Caches that implement LINK should invalidate cached responses as
   defined in section 13.10 for PUT.

19.6.1.3 UNLINK

   The UNLINK method removes one or more Link relationships from the
   existing resource identified by the Request-URI. These relationships
   may have been established using the LINK method or by any other
   method supporting the Link header. The removal of a link to a
   resource does not imply that the resource ceases to exist or becomes
   inaccessible for future references.

   If the request passes through a cache and the Request-URI identifies
   a currently cached entity, that entity MUST be removed from the
   cache.  Responses to this method are not cachable.

   Caches that implement UNLINK should invalidate cached responses as
   defined in section 13.10 for PUT.

19.6.2 Additional Header Field Definitions

19.6.2.1 Alternates

   The Alternates response-header field has been proposed as a means for
   the origin server to inform the client about other available
   representations of the requested resource, along with their
   distinguishing attributes, and thus providing a more reliable means
   for a user agent to perform subsequent selection of another
   representation which better fits the desires of its user (described
   as agent-driven negotiation in section 12).

noToC RFC2068 - Page 158

   The Alternates header field is orthogonal to the Vary header field in
   that both may coexist in a message without affecting the
   interpretation of the response or the available representations. It
   is expected that Alternates will provide a significant improvement
   over the server-driven negotiation provided by the Vary field for
   those resources that vary over common dimensions like type and
   language.

   The Alternates header field will be defined in a future
   specification.

19.6.2.2 Content-Version

   The Content-Version entity-header field defines the version tag
   associated with a rendition of an evolving entity. Together with the
   Derived-From field described in section 19.6.2.3, it allows a group
   of people to work simultaneously on the creation of a work as an
   iterative process. The field should be used to allow evolution of a
   particular work along a single path rather than derived works or
   renditions in different representations.

          Content-Version = "Content-Version" ":" quoted-string

   Examples of the Content-Version field include:

          Content-Version: "2.1.2"
          Content-Version: "Fred 19950116-12:26:48"
          Content-Version: "2.5a4-omega7"

19.6.2.3 Derived-From

   The Derived-From entity-header field can be used to indicate the
   version tag of the resource from which the enclosed entity was
   derived before modifications were made by the sender. This field is
   used to help manage the process of merging successive changes to a
   resource, particularly when such changes are being made in parallel
   and from multiple sources.

          Derived-From   = "Derived-From" ":" quoted-string

   An example use of the field is:

          Derived-From: "2.1.1"

   The Derived-From field is required for PUT and PATCH requests if the
   entity being sent was previously retrieved from the same URI and a
   Content-Version header was included with the entity when it was last
   retrieved.

noToC RFC2068 - Page 159

19.6.2.4 Link

   The Link entity-header field provides a means for describing a
   relationship between two resources, generally between the requested
   resource and some other resource. An entity MAY include multiple Link
   values. Links at the metainformation level typically indicate
   relationships like hierarchical structure and navigation paths. The
   Link field is semantically equivalent to the <LINK> element in
   HTML.[5]

          Link           = "Link" ":" #("<" URI ">" *( ";" link-param )

          link-param     = ( ( "rel" "=" relationship )
                             | ( "rev" "=" relationship )
                             | ( "title" "=" quoted-string )
                             | ( "anchor" "=" <"> URI <"> )
                             | ( link-extension ) )

          link-extension = token [ "=" ( token | quoted-string ) ]

          relationship   = sgml-name
                         | ( <"> sgml-name *( SP sgml-name) <"> )

          sgml-name      = ALPHA *( ALPHA | DIGIT | "." | "-" )

   Relationship values are case-insensitive and MAY be extended within
   the constraints of the sgml-name syntax. The title parameter MAY be
   used to label the destination of a link such that it can be used as
   identification within a human-readable menu. The anchor parameter MAY
   be used to indicate a source anchor other than the entire current
   resource, such as a fragment of this resource or a third resource.

   Examples of usage include:

       Link: <http://www.cern.ch/TheBook/chapter2>; rel="Previous"

       Link: <mailto:timbl@w3.org>; rev="Made"; title="Tim Berners-Lee"

   The first example indicates that chapter2 is previous to this
   resource in a logical navigation path. The second indicates that the
   person responsible for making the resource available is identified by
   the given e-mail address.

19.6.2.5 URI

   The URI header field has, in past versions of this specification,
   been used as a combination of the existing Location, Content-
   Location, and Vary header fields as well as the future Alternates

noToC RFC2068 - Page 160

   field (above). Its primary purpose has been to include a list of
   additional URIs for the resource, including names and mirror
   locations. However, it has become clear that the combination of many
   different functions within this single field has been a barrier to
   consistently and correctly implementing any of those functions.
   Furthermore, we believe that the identification of names and mirror
   locations would be better performed via the Link header field. The
   URI header field is therefore deprecated in favor of those other
   fields.

          URI-header    = "URI" ":" 1#( "<" URI ">" )

19.7 Compatibility with Previous Versions

   It is beyond the scope of a protocol specification to mandate
   compliance with previous versions. HTTP/1.1 was deliberately
   designed, however, to make supporting previous versions easy. It is
   worth noting that at the time of composing this specification, we
   would expect commercial HTTP/1.1 servers to:

  o  recognize the format of the Request-Line for HTTP/0.9, 1.0, and 1.1
     requests;

  o  understand any valid request in the format of HTTP/0.9, 1.0, or
     1.1;

  o  respond appropriately with a message in the same major version used
     by the client.

   And we would expect HTTP/1.1 clients to:

  o  recognize the format of the Status-Line for HTTP/1.0 and 1.1
     responses;

  o  understand any valid response in the format of HTTP/0.9, 1.0, or
     1.1.

   For most implementations of HTTP/1.0, each connection is established
   by the client prior to the request and closed by the server after
   sending the response. A few implementations implement the Keep-Alive
   version of persistent connections described in section 19.7.1.1.

noToC RFC2068 - Page 161

19.7.1 Compatibility with HTTP/1.0 Persistent Connections

   Some clients and servers may wish to be compatible with some previous
   implementations of persistent connections in HTTP/1.0 clients and
   servers. Persistent connections in HTTP/1.0 must be explicitly
   negotiated as they are not the default behavior. HTTP/1.0
   experimental implementations of persistent connections are faulty,
   and the new facilities in HTTP/1.1 are designed to rectify these
   problems. The problem was that some existing 1.0 clients may be
   sending Keep-Alive to a proxy server that doesn't understand
   Connection, which would then erroneously forward it to the next
   inbound server, which would establish the Keep-Alive connection and
   result in a hung HTTP/1.0 proxy waiting for the close on the
   response. The result is that HTTP/1.0 clients must be prevented from
   using Keep-Alive when talking to proxies.

   However, talking to proxies is the most important use of persistent
   connections, so that prohibition is clearly unacceptable. Therefore,
   we need some other mechanism for indicating a persistent connection
   is desired, which is safe to use even when talking to an old proxy
   that ignores Connection. Persistent connections are the default for
   HTTP/1.1 messages; we introduce a new keyword (Connection: close) for
   declaring non-persistence.

   The following describes the original HTTP/1.0 form of persistent
   connections.

   When it connects to an origin server, an HTTP client MAY send the
   Keep-Alive connection-token in addition to the Persist connection-
   token:

          Connection: Keep-Alive

   An HTTP/1.0 server would then respond with the Keep-Alive connection
   token and the client may proceed with an HTTP/1.0 (or Keep-Alive)
   persistent connection.

   An HTTP/1.1 server may also establish persistent connections with
   HTTP/1.0 clients upon receipt of a Keep-Alive connection token.
   However, a persistent connection with an HTTP/1.0 client cannot make
   use of the chunked transfer-coding, and therefore MUST use a
   Content-Length for marking the ending boundary of each message.

   A client MUST NOT send the Keep-Alive connection token to a proxy
   server as HTTP/1.0 proxy servers do not obey the rules of HTTP/1.1
   for parsing the Connection header field.

noToC RFC2068 - Page 162

19.7.1.1 The Keep-Alive Header

   When the Keep-Alive connection-token has been transmitted with a
   request or a response, a Keep-Alive header field MAY also be
   included. The Keep-Alive header field takes the following form:

          Keep-Alive-header = "Keep-Alive" ":" 0# keepalive-param

          keepalive-param = param-name "=" value

   The Keep-Alive header itself is optional, and is used only if a
   parameter is being sent. HTTP/1.1 does not define any parameters.

   If the Keep-Alive header is sent, the corresponding connection token
   MUST be transmitted. The Keep-Alive header MUST be ignored if
   received without the connection token.