RFC 0883

Domain names: Implementation specification

Pages: 74
Obsoleted by: 1034 1035
Updated by: 0973

Part 2 of 3 – Pages 25 to 49

noToC RFC0883 - Page 25 prevText

      Another query might be:

                          +-----------------------------------------+
            Header        |         OPCODE=CQUERYM, ID=410          |
                          +-----------------------------------------+
           Question       |       QTYPE=A, QCLASS=IN, QNAME=B       |
                          +-----------------------------------------+
            Answer        |                 <empty>                 |
                          +-----------------------------------------+
           Authority      |                 <empty>                 |
                          +-----------------------------------------+
          Additional      |               ARPA NULL IN              |
                          +-----------------------------------------+

      This query is similar to the previous one, but specifies a target
      of ARPA rather than ISI.ARPA.  It also allows multiple matches.
      In this case the same name server might return:

                          +-----------------------------------------+
            Header        |         OPCODE=RESPONSE, ID=410         |
                          +-----------------------------------------+
           Question       |       QTYPE=A, QCLASS=IN, QNAME=B       |
                          +-----------------------------------------+
            Answer        |        B.ISI.ARPA A IN 10.3.0.52        |
                          |                    -                    |
                          |        B.BBN.ARPA A IN 10.0.0.49        |
                          |                    -                    |
                          |        B.BBNCC.ARPA A IN 8.1.0.2        |
                          +-----------------------------------------+
           Authority      |                 <empty>                 |
                          +-----------------------------------------+
          Additional      |               ARPA NULL IN              |
                          +-----------------------------------------+

      This response contains three answers, B.ISI.ARPA, B.BBN.ARPA, and
      B.BBNCC.ARPA.

noToC RFC0883 - Page 26

   Recursive Name Service

      Recursive service is an optional feature of name servers.

      When a name server receives a query regarding a part of the name
      space which is not in one of the name server's zones, the standard
      response is a message that refers the requestor to another name
      server.  By iterating on these referrals, the requestor eventually
      is directed to a name server that has the required information.

      Name servers may also implement recursive service.  In this type
      of service, a name server either answers immediately based on
      local zone information, or pursues the query for the requestor and
      returns the eventual result back to the original requestor.

      A name server that supports recursive service sets the Recursion
      Available (RA) bit in all responses it generates.  A requestor
      asks for recursive service by setting the Recursion Desired (RD)
      bit in queries.  In some situations where recursive service is the
      only path to the desired information (see below), the name server
      may go recursive even if RD is zero.

      If a query requests recursion (RD set), but the name server does
      not support recursion, and the query needs recursive service for
      an answer, the name server returns a "Not Implemented" (NI) error
      code.  If the query can be answered without recursion since the
      name server is authoritative for the query, it ignores the RD bit.

      Because of the difficulty in selecting appropriate timeouts and
      error handling, recursive service is best suited to virtual
      circuits, although it is allowed for datagrams.

      Recursive service is valuable in several special situations:

         In a system of small personal computers clustered around one or
         more large hosts supporting name servers, the recursive
         approach minimizes the amount of code in the resolvers in the
         personal computers.  Such a design moves complexity out of the
         resolver into the name server, and may be appropriate for such
         systems.

         Name servers on the boundaries of different networks may wish
         to offer recursive service to create connectivity between
         different networks.  Such name servers may wish to provide
         recursive service regardless of the setting of RD.

         Name servers that translate between domain name service and
         some other name service may wish to adopt the recursive style.
         Implicit recursion may be valuable here as well.

noToC RFC0883 - Page 27

      These concepts are still under development.

noToC RFC0883 - Page 28

   Header section format

           +-----------------------------------------------+
           |                                               |
           |             *****  WARNING  *****             |
           |                                               |
           |  The following format is preliminary and is   |
           | included for purposes of explanation only. In |
           | particular, the size and position of the      |
           | OPCODE, RCODE fields and the number and       |
           | meaning of the single bit fields are subject  |
           | to change.                                    |
           |                                               |
           +-----------------------------------------------+

      The header contains the following fields:

                                           1  1  1  1  1  1 
             0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5 
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                      ID                       |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |QR|   Opcode  |AA|TC|RD|RA|        |   RCODE   |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                    QDCOUNT                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                    ANCOUNT                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                    NSCOUNT                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                    ARCOUNT                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

      where:

      ID      - A 16 bit identifier assigned by the program that
                generates any kind of query.  This identifier is copied
                into all replies and can be used by the requestor to
                relate replies to outstanding questions.

      QR      - A one bit field that specifies whether this message is a
                query (0), or a response (1).

      OPCODE  - A four bit field that specifies kind of query in this
                message.  This value is set by the originator of a query
                and copied into the response.  The values are:

                        0   a standard query (QUERY)

noToC RFC0883 - Page 29

                        1   an inverse query (IQUERY)

                        2   an completion query allowing multiple
                            answers (CQUERYM)

                        2   an completion query requesting a single
                            answer (CQUERYU)

                        4-15 reserved for future use

      AA      - Authoritative Answer - this bit is valid in responses,
                         and specifies that the responding name server
                         is an authority for the domain name in the
                         corresponding query.

      TC      - TrunCation - specifies that this message was truncated
                         due to length greater than 512 characters.
                         This bit is valid in datagram messages but not
                         in messages sent over virtual circuits.

      RD      - Recursion Desired - this bit may be set in a query and
                         is copied into the response.  If RD is set, it
                         directs the name server to pursue the query
                         recursively.  Recursive query support is
                         optional.

      RA      - Recursion Available - this be is set or cleared in a
                         response, and denotes whether recursive query
                         support is available in the name server.

      RCODE   - Response code - this 4 bit field is set as part of
                         responses.  The values have the following
                         interpretation:

                        0    No error condition

                        1    Format error - The name server was unable
                             to interpret the query.

                        2    Server failure - The name server was unable
                             to process this query due to a problem with
                             the name server.

                        3    Name Error - Meaningful only for responses
                             from an authoritative name server, this
                             code signifies that the domain name
                             referenced in the query does not exist.

noToC RFC0883 - Page 30

                        4    Not Implemented - The name server does not
                             support the requested kind of query.

                        5    Refused - The name server refuses to
                             perform the specified operation for policy
                             reasons.  For example, a name server may
                             not wish to provide the information to the
                             particular requestor, or a name server may
                             not wish to perform a particular operation
                             (e.g. zone transfer) for particular data.

                        6-15 Reserved for future use.

      QDCOUNT - an unsigned 16 bit integer specifying the number of
                entries in the question section.

      ANCOUNT - an unsigned 16 bit integer specifying the number of
                resource records in the answer section.

      NSCOUNT - an unsigned 16 bit integer specifying the number of name
                server resource records in the authority records
                section.

      ARCOUNT - an unsigned 16 bit integer specifying the number of
                resource records in the additional records section.

noToC RFC0883 - Page 31

   Question section format

      The question section is used in all kinds of queries other than
      inverse queries.  In responses to inverse queries, this section
      may contain multiple entries; for all other responses it contains
      a single entry.  Each entry has the following format:

                                           1  1  1  1  1  1 
             0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5 
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                                               |
           /                     QNAME                     /
           /                                               /
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                     QTYPE                     |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                     QCLASS                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

      where:

      QNAME -   a variable number of octets that specify a domain name.
                This field uses the compressed domain name format
                described in the next section of this memo.  This field
                can be used to derive a text string for the domain name.
                Note that this field may be an odd number of octets; no
                padding is used.

      QTYPE -   a two octet code which specifies the type of the query.
                The values for this field include all codes valid for a
                TYPE field, together with some more general codes which
                can match more than one type of RR.  For example, QTYPE
                might be A and only match type A RRs, or might be MAILA,
                which matches MF and MD type RRs.  The values for this
                field are listed in Appendix 2.

      QCLASS -  a two octet code that specifies the class of the query.
                For example, the QCLASS field is IN for the ARPA
                Internet, CS for the CSNET, etc.  The numerical values
                are defined in Appendix 2.

noToC RFC0883 - Page 32

   Resource record format

      The answer, authority, and additional sections all share the same
      format: a variable number of resource records, where the number of
      records is specified in the corresponding count field in the
      header.  Each resource record has the following format:

                                           1  1  1  1  1  1 
             0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5 
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                                               |
           /                                               /
           /                      NAME                     /
           |                                               |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                      TYPE                     |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                     CLASS                     |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                      TTL                      |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
           |                   RDLENGTH                    |
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
           /                     RDATA                     /
           /                                               /
           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

      where:

      NAME    - a compressed domain name to which this resource record
                pertains.

      TYPE    - two octets containing one of the RR type codes defined
                in Appendix 2.  This field specifies the meaning of the
                data in the RDATA field.

      CLASS   - two octets which specify the class of the data in the
                RDATA field.

      TTL     - a 16 bit unsigned integer that specifies the time
                interval (in seconds) that the resource record may be
                cached before it should be discarded.  Zero values are
                interpreted to mean that the RR can only be used for the
                transaction in progress, and should not be cached.  For
                example, SOA records are always distributed with a zero
                TTL to prohibit caching.  Zero values can also be used
                for extremely volatile data.

noToC RFC0883 - Page 33

      RDLENGTH- an unsigned 16 bit integer that specifies the length in
                octets of the RDATA field.

      RDATA   - a variable length string of octets that describes the
                resource.  The format of this information varies
                according to the TYPE and CLASS of the resource record.
                For example, the if the TYPE is A and the CLASS is IN,
                the RDATA field is a 4 octet ARPA Internet address.

      Formats for particular resource records are shown in Appendicies 2
      and 3.

   Domain name representation and compression

      Domain names messages are expressed in terms of a sequence of
      labels.  Each label is represented as a one octet length field
      followed by that number of octets.  Since every domain name ends
      with the null label of the root, a compressed  domain name is
      terminated by a length byte of zero.  The high order two bits of
      the length field must be zero, and the remaining six bits of the
      length field limit the label to 63 octets or less.

      To simplify implementations, the total length of label octets and
      label length octets that make up a domain name is restricted to
      255 octets or less.  Since the trailing root label and its dot are
      not printed, printed domain names are 254 octets or less.

      Although labels can contain any 8 bit values in octets that make
      up a label, it is strongly recommended that labels follow the
      syntax described in Appendix 1 of this memo, which is compatible
      with existing host naming conventions.  Name servers and resolvers
      must compare labels in a case-insensitive manner, i.e. A=a, and
      hence all character strings must be ASCII with zero parity.
      Non-alphabetic codes must match exactly.

      Whenever possible, name servers and resolvers must preserve all 8
      bits of domain names they process.  When a name server is given
      data for the same name under two different case usages, this
      preservation is not always possible.  For example, if a name
      server is given data for ISI.ARPA and isi.arpa, it should create a
      single node, not two, and hence will preserve a single casing of
      the label.  Systems with case sensitivity should take special
      precautions to insure that the domain data for the system is
      created with consistent case.

      In order to reduce the amount of space used by repetitive domain
      names, the sequence of octets that defines a domain name may be
      terminated by a pointer to the length octet of a previously
      specified label string.  The label string that the pointer

noToC RFC0883 - Page 34

      specifies is appended to the already specified label string.
      Exact duplication of a previous label string can be done with a
      single pointer.  Multiple levels are allowed.

      Pointers can only be used in positions in the message where the
      format is not class specific.  If this were not the case, a name
      server that was handling a RR for another class could make
      erroneous copies of RRs.  As yet, there are no such cases, but
      they may occur in future RDATA formats.

      If a domain name is contained in a part of the message subject to
      a length field (such as the RDATA section of an RR), and
      compression is used, the length of the compressed name is used in
      the length calculation, rather than the length of the expanded
      name.

      Pointers are represented as a two octet field in which the high
      order 2 bits are ones, and the low order 14 bits specify an offset
      from the start of the message.  The 01 and 10 values of the high
      order bits are reserved for future use and should not be used.

      Programs are free to avoid using pointers in datagrams they
      generate, although this will reduce datagram capacity.  However
      all programs are required to understand arriving messages that
      contain pointers.

      For example, a datagram might need to use the domain names
      F.ISI.ARPA, FOO.F.ISI.ARPA, ARPA, and the root.  Ignoring the
      other fields of the message, these domain names might be
      represented as:

noToC RFC0883 - Page 35

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          20 |           1           |           F           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          22 |           3           |           I           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          24 |           S           |           I           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          26 |           4           |           A           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          28 |           R           |           P           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          30 |           A           |           0           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          40 |           3           |           F           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          42 |           O           |           O           |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          44 | 1  1|                20                       |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          64 | 1  1|                26                       |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          92 |           0           |                       |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

      The domain name for F.ISI.ARPA is shown at offset 20.  The domain
      name FOO.F.ISI.ARPA is shown at offset 40; this definition uses a
      pointer to concatenate a label for FOO to the previously defined
      F.ISI.ARPA.  The domain name ARPA is defined at offset 64 using a
      pointer to the ARPA component of the name F.ISI.ARPA at 20; note
      that this reference relies on ARPA being the last label in the
      string at 20.  The root domain name is defined by a single octet
      of zeros at 92; the root domain name has no labels.

   Organization of the Shared database

      While name server implementations are free to use any internal
      data structures they choose, the suggested structure consists of
      several separate trees.  Each tree has structure corresponding to
      the domain name space, with RRs attached to nodes and leaves.
      Each zone of authoritative data has a separate tree, and one tree
      holds all non-authoritative data.  All of the trees corresponding
      to zones are managed identically, but the non-authoritative or
      cache tree has different management procedures.

noToC RFC0883 - Page 36

      Data stored in the database can be kept in whatever form is
      convenient for the name server, so long as it can be transformed
      back into the format needed for messages.  In particular, the
      database will probably use structure in place of expanded domain
      names, and will also convert many of the time intervals used in
      the domain systems to absolute local times.

      Each tree corresponding to a zone has complete information for a
      "pruned" subtree of the domain space.  The top node of a zone has
      a SOA record that marks the start of the zone.  The bottom edge of
      the zone is delimited by nodes containing NS records signifying
      delegation of authority to other zones, or by leaves of the domain
      tree.  When a name server contains abutting zones, one tree will
      have a bottom node containing a NS record, and the other tree will
      begin with a tree location containing a SOA record.

      Note that there is one special case that requires consideration
      when a name server is implemented.  A node that contains a SOA RR
      denoting a start of zone will also have NS records that identify
      the name servers that are expected to have a copy of the zone.
      Thus a name server will usually find itself (and possibly other
      redundant name servers) referred to in NS records occupying the
      same position in the tree as SOA records.  The solution to this
      problem is to never interpret a NS record as delimiting a zone
      started by a SOA at the same point in the tree.  (The sample
      programs in this memo deal with this problem by processing SOA
      records only after NS records have been processed.)

      Zones may also overlap a particular part of the name space when
      they are of different classes.

      Other than the abutting and separate class cases, trees are always
      expected to be disjoint.  Overlapping zones are regarded as a
      non-fatal error.  The scheme described in this memo avoids the
      overlap issue by maintaining separate trees; other designs must
      take the appropriate measures to defend against possible overlap.

      Non-authoritative data is maintained in a separate tree.  This
      tree is unlike the zone trees in that it may have "holes".  Each
      RR in the cache tree has its own TTL that is separately managed.
      The data in this tree is never used if authoritative data is
      available from a zone tree; this avoids potential problems due to
      cached data that conflicts with authoritative data.

      The shared database will also contain data structures to support
      the processing of inverse queries and completion queries if the
      local system supports these optional features.  Although many
      schemes are possible, this memo describes a scheme that is based
      on tables of pointers that invert the database according to key.

noToC RFC0883 - Page 37

      Each kind of retrieval has a separate set of tables, with one
      table per zone.  When a zone is updated, these tables must also be
      updated.  The contents of these tables are discussed in the
      "Inverse query processing" and "Completion query processing"
      sections of this memo.

      The database implementation described here includes two locks that
      are used to control concurrent access and modification of the
      database by name server query processing, name server maintenance
      operations, and resolver access:

         The first lock ("main lock") controls access to all of the
         trees.  Multiple concurrent reads are allowed, but write access
         can only be acquired by a single process.  Read and write
         access are mutually exclusive.  Resolvers and name server
         processes that answer queries acquire this lock in read mode,
         and unlock upon completion of the current message.  This lock
         is acquired in write mode by a name server maintenance process
         when it is about to change data in the shared database.  The
         actual update procedures are described under "NAME SERVER
         MAINTENANCE" but are designed to be brief.

         The second lock ("cache queue lock") controls access to the
         cache queue.  This queue is used by a resolver that wishes to
         add information to the cache tree.  The resolver acquires this
         lock, then places the RRs to be cached into the queue.  The
         name server maintenance procedure periodically acquires this
         lock and adds the queue information to the cache.  The
         rationale for this procedure is that it allows the resolver to
         operate with read-only access to the shared database, and
         allows the update process to batch cache additions and the
         associated costs for inversion calculations.  The name server
         maintenance procedure must take appropriate precautions to
         avoid problems with data already in the cache, inversions, etc.

      This organization solves several difficulties:

         When searching the domain space for the answer to a query, a
         name server can restrict its search for authoritative data to
         that tree that matches the most labels on the right side of the
         domain name of interest.

         Since updates to a zone must be atomic with respect to
         searches, maintenance operations can simply acquire the main
         lock, insert a new copy of a particular zone without disturbing
         other zones, and then release the storage used by the old copy.
         Assuming a central table pointing to valid zone trees, this
         operation can be a simple pointer swap.

noToC RFC0883 - Page 38

         TTL management of zones can be performed using the SOA record
         for the zone.  This avoids potential difficulties if individual
         RRs in a zone could be timed out separately.  This issue is
         discussed further in the maintenance section.

   Query processing

      The following algorithm outlines processing that takes place at a
      name server when a query arrives:

      1. Search the list of zones to find zones which have the same
         class as the QCLASS field in the query and have a top domain
         name that matches the right end of the QNAME field.  If there
         are none, go to step 2.  If there are more than one, pick the
         zone that has the longest match and go to step 3.

      2. Since the zone search failed, the only possible RRs are
         contained in the non-authoritative tree.  Search the cache tree
         for the NS record that has the same class as the QCLASS field
         and the largest right end match for domain name.  Add the NS
         record or records to the authority section of the response.  If
         the cache tree has RRs that are pertinent to the question
         (domain names match, classes agree, not timed-out, and the type
         field is relevant to the QTYPE), copy these RRs into the answer
         section of the response.  The name server may also search the
         cache queue.  Go to step 4.

      3. Since this zone is the best match, the zone in which QNAME
         resides is either this zone or a zone to which this zone will
         directly or indirectly delegate authority.  Search down the
         tree looking for a NS RR or the node specified by QNAME.

            If the node exists and has no NS record, copy the relevant
            RRs to the answer section of the response and go to step 4.

            If a NS RR is found, either matching a part or all of QNAME,
            then QNAME is in a delegated zone outside of this zone.  If
            so, copy the NS record or records into the authority section
            of the response, and search the remainder of the zone for an
            A type record corresponding to the NS reference.  If the A
            record is found, add it to the additional section.  Go to
            step 2.

            If the node is not found and a NS is not found, there is no
            such name; set the Name error bit in the response and exit.

      4. When this step is reached, the answer and authority sections
         are complete.  What remains is to complete the additional
         section.  This procedure is only possible if the name server

noToC RFC0883 - Page 39

         knows the data formats implied by the class of records in the
         answer and authority sections.  Hence this procedure is class
         dependent.  Appendix 3 discusses this procedure for Internet
         class data.

      While this algorithm deals with typical queries and databases,
      several additions are required that will depend on the database
      supported by the name server:

      QCLASS=*

         Special procedures are required when the QCLASS of the query is
         "*".  If the database contains several classes of data, the
         query processing steps above are performed separately for each
         CLASS, and the results are merged into a single response.  The
         name error condition is not meaningful for a QCLASS=* query.
         If the requestor wants this information, it must test each
         class independently.

         If the database is limited to data of a particular class, this
         operation can be performed by simply reseting the authoritative
         bit in the response, and performing the query as if QCLASS was
         the class used in the database.

      * labels in database RRs

         Some zones will contain default RRs that use * to match in
         cases where the search fails for a particular domain name.  If
         the database contains these records then a failure must be
         retried using * in place of one or more labels of the search
         key.  The procedure is to replace labels from the left with
         "*"s looking for a match until either all labels have been
         replaced, or a match is found.  Note that these records can
         never be the result of caching, so a name server can omit this
         processing for zones that don't contain RRs with * in labels,
         or can omit this processing entirely if * never appears in
         local authoritative data.

   Inverse query processing

      Name servers that support inverse queries can support these
      operations through exhaustive searches of their databases, but
      this becomes impractical as the size of the database increases.
      An alternative approach is to invert the database according to the
      search key.

      For name servers that support multiple zones and a large amount of
      data, the recommended approach is separate inversions for each

noToC RFC0883 - Page 40

      zone.  When a particular zone is changed during a refresh, only
      its inversions need to be redone.

      Support for transfer of this type of inversion may be included in
      future versions of the domain system, but is not supported in this
      version.

   Completion query processing

      Completion query processing shares many of the same problems in
      data structure design as are found in inverse queries, but is
      different due to the expected high rate of use of top level labels
      (ie., ARPA, CSNET).  A name server that wishes to be efficient in
      its use of memory may well choose to invert only occurrences of
      ARPA, etc. that are below the top level, and use a search for the
      rare case that top level labels are used to constrain a
      completion.

noToC RFC0883 - Page 41

NAME SERVER MAINTENANCE

   Introduction

      Name servers perform maintenance operations on their databases to
      insure that the data they distribute is accurate and timely.  The
      amount and complexity of the maintenance operations that a name
      server must perform are related to the size, change rate, and
      complexity of the database that the name server manages.

      Maintenance operations are fundamentally different for
      authoritative and non-authoritative data.  A name server actively
      attempts to insure the accuracy and timeliness of authoritative
      data by refreshing the data from master copies.  Non-authoritative
      data is merely purged when its time-to-live expires; the name
      server does not attempt to refresh it.

      Although the refreshing scheme is fairly simple to implement, it
      is somewhat less powerful than schemes used in other distributed
      database systems.  In particular, an update to the master does not
      immediately update copies, and should be viewed as gradually
      percolating though the distributed database.  This is adequate for
      the vast majority of applications.  In situations where timliness
      is critical, the master name server can prohibit caching of copies
      or assign short timeouts to copies.

   Conceptual model of maintenance operations

      The vast majority of information in the domain system is derived
      from master files scattered among hosts that implement name
      servers; some name servers will have no master files, other name
      servers will have one or more master files.  Each master file
      contains the master data for a single zone of authority rather
      than data for the whole domain name space.  The administrator of a
      particular zone controls that zone by updating its master file.

      Master files and zone copies from remote servers may include RRs
      that are outside of the zone of authority when a NS record
      delegates authority to a domain name that is a descendant of the
      domain name at which authority is delegated.  These forward
      references are a problem because there is no reasonable method to
      guarantee that the A type records for the delegatee are available
      unless they can somehow be attached to the NS records.

      For example, suppose the ARPA zone delegates authority at
      MIT.ARPA, and states that the name server is on AI.MIT.ARPA.  If a
      resolver gets the NS record but not the A type record for
      AI.MIT.ARPA, it might try to ask the MIT name server for the
      address of AI.MIT.ARPA.

noToC RFC0883 - Page 42

      The solution is to allow type A records that are outside of the
      zone of authority to be copied with the zone.  While these records
      won't be found in a search for the A type record itself, they can
      be protected by the zone refreshing system, and will be passed
      back whenever the name server passes back a referral to the
      corresponding NS record.  If a query is received for the A record,
      the name server will pass back a referral to the name server with
      the A record in the additional section, rather than answer
      section.

      The only exception to the use of master files is a small amount of
      data stored in boot files.  Boot file data is used by name servers
      to provide enough resource records to allow zones to be imported
      from foreign servers (e.g. the address of the server), and to
      establish the name and address of root servers.  Boot file records
      establish the initial contents of the cache tree, and hence can be
      overridden by later loads of authoritative data.

      The data in a master file first becomes available to users of the
      domain name system when it is loaded by the corresponding name
      server.  By definition, data from a master file is authoritative.

      Other name servers which wish to be authoritative for a particular
      zone do so by transferring a copy of the zone from the name server
      which holds the master copy using a virtual circuit.  These copies
      include parameters which specify the conditions under which the
      data in the copy is authoritative.  In the most common case, the
      conditions specify a refresh interval and policies to be followed
      when the refresh operation cannot be performed.

      A name server may acquire multiple zones from different name
      servers and master files, but the name server must maintain each
      zone separately from others and from non-authoritative data.

      When the refresh interval for a particular zone copy expires, the
      name server holding the copy must consult the name server that
      holds the master copy.  If the data in the zone has not changed,
      the master name server instructs the copy name server to reset the
      refresh interval.  If the data has changed, the master passes a
      new copy of the zone and its associated conditions to the copy
      name server.  Following either of these transactions, the copy
      name server begins a new refresh interval.

      Copy name servers must also deal with error conditions under which
      they are unable to communicate with the name server that holds the
      master copy of a particular zone.  The policies that a copy name
      server uses are determined by other parameters in the conditions
      distributed with every copy.  The conditions include a retry
      interval and a maximum holding time.  When a copy name server is

noToC RFC0883 - Page 43

      unable to establish communications with a master or is unable to
      complete the refresh transaction, it must retry the refresh
      operation at the rate specified by the retry interval.  This retry
      interval will usually be substantially shorter than the refresh
      interval.  Retries continue until the maximum holding time is
      reached.  At that time the copy name server must assume that its
      copy of the data for the zone in question is no longer
      authoritative.

      Queries must be processed while maintenance operations are in
      progress because a zone transfer can take a long time.  However,
      to avoid problems caused by access to partial databases, the
      maintenance operations create new copies of data rather than
      directly modifying the old copies.  When the new copy is complete,
      the maintenance process locks out queries for a short time using
      the main lock, and switches pointers to replace the old data with
      the new.  After the pointers are swapped, the maintenance process
      unlocks the main lock and reclaims the storage used by the old
      copy.

   Name server data structures and top level logic

      The name server must multiplex its attention between multiple
      activities.  For example, a name server should be able to answer
      queries while it is also performing refresh activities for a
      particular zone.  While it is possible to design a name server
      that devotes a separate process to each query and refresh activity
      in progress, the model described in this memo is based on the
      assumption that there is a single process performing all
      maintenance operations, and one or more processes devoted to
      handling queries.  The model also assumes the existence of shared
      memory for several control structures, the domain database, locks,
      etc.

      The model name server uses the following files and shared data
      structures:

         1. A configuration file that describes the master and boot
            files which the name server should load and the zones that
            the name server should attempt to load from foreign name
            servers.  This file establishes the initial contents of the
            status table.

         2. Domain data files that contain master and boot data to be
            loaded.

         3. A status table that is derived from the configuration file.
            Each entry in this table describes a source of data.  Each
            entry has a zone number.  The zone number is zero for

noToC RFC0883 - Page 44

            non-authoritative sources; authoritative sources are
            assigned separate non-zero numbers.

         4. The shared database that holds the domain data.  This
            database is assumed to be organized in some sort of tree
            structure paralleling the domain name space, with a list of
            resource records attached to each node and leaf in the tree.
            The elements of the resource record list need not contain
            the exact data present in the corresponding output format,
            but must contain data sufficient to create the output
            format; for example, these records need not contain the
            domain name that is associated with the resource because
            that name can be derived from the tree structure.  Each
            resource record also internal data that the name server uses
            to organize its data.

         5. Inversion data structures that allow the name server to
            process inverse queries and completion queries.  Although
            many structures could be used, the implementation described
            in this memo supposes that there is one array for every
            inversion that the name server can handle.  Each array
            contains a list of pointers to resource records such that
            the order of the inverted quantities is sorted.

         6. The main and cache queue locks

         7. The cache queue

      The maintenance process begins by loading the status table from
      the configuration file.  It then periodically checks each entry,
      to see if its refresh interval has elapsed.  If not, it goes on to
      the next entry.  If so, it performs different operations depending
      on the entry:

         If the entry is for zone 0, or the cache tree, the maintenance
         process checks to see if additions or deletions are required.
         Additions are acquired from the cache queue using the cache
         queue lock.  Deletions are detected using TTL checks.  If any
         changes are required, the maintenance process recalculates
         inversion data structures and then alters the cache tree under
         the protection of the main lock.  Whenever the maintenance
         process modifies the cache tree, it resets the refresh interval
         to the minimum of the contained TTLs and the desired time
         interval for cache additions.

         If the entry is not zone 0, and the entry refers to a local
         file, the maintenance process checks to see if the file has
         been modified since its last load.  If so the file is reloaded
         using the procedures specified under "Name server file

noToC RFC0883 - Page 45

         loading".  The refresh interval is reset to that specified in
         the SOA record if the file is a master file.

         If the entry is for a remote master file, the maintenance
         process checks for a new version using the procedure described
         in "Names server remote zone transfer".

   Name server file loading

      Master files are kept in text form for ease of editing by system
      maintainers.  These files are not exchanged by name servers; name
      servers use the standard message format when transferring zones.

      Organizations that want to have a domain, but do not want to run a
      name server, can use these files to supply a domain definition to
      another organization that will run a name server for them.  For
      example, if organization X wants a domain but not a name server,
      it can find another organization, Y, that has a name server and is
      willing to provide service for X.  Organization X defines domain X
      via the master file format and ships a copy of the master file to
      organization Y via mail, FTP, or some other method.  A system
      administrator at Y configures Y's name server to read in X's file
      and hence support the X domain.  X can maintain the master file
      using a text editor and send new versions to Y for installation.

      These files have a simple line-oriented format, with one RR per
      line.  Fields are separated by any combination of blanks and tab
      characters.  Tabs are treated the same as spaces; in the following
      discussion the term "blank" means either a tab or a blank.  A line
      can be either blank (and ignored), a RR, or a $INCLUDE line.

      If a RR line starts with a domain name, that domain name is used
      to specify the location in the domain space for the record, i.e.
      the owner.  If a RR line starts with a blank, it is loaded into
      the location specified by the most recent location specifier.

      The location specifiers are assumed to be relative to some origin
      that is provided by the user of a file unless the location
      specifier contains the root label.  This provides a convenient
      shorthand notation, and can also be used to prevent errors in
      master files from propagating into other zones.  This feature is
      particularly useful for master files imported from other sites.

      An include line begins with $INCLUDE, starting at the first line
      position, and is followed by a local file name and an optional
      offset modifier.  The filename follows the appropriate local
      conventions.  The offset is one or more labels that are added to
      the offset in use for the file that contained the $INCLUDE.  If
      the offset is omitted, the included file is loaded using the

noToC RFC0883 - Page 46

      offset of the file that contained the $INCLUDE command.  For
      example, a file being loaded at offset ARPA might contain the
      following lines:

                $INCLUDE <subsys>isi.data ISI           
                $INCLUDE <subsys>addresses.data         

      The first line would be interpreted to direct loading of the file
      <subsys>isi.data at offset ISI.ARPA.  The second line would be
      interpreted as a request to load data at offset ARPA.

      Note that $INCLUDE commands do not cause data to be loaded into a
      different zone or tree; they are simply ways to allow data for a
      given zone to be organized in separate files.  For example,
      mailbox data might be kept separately from host data using this
      mechanism.

      Resource records are entered as a sequence of fields corresponding
      to the owner name, TTL, CLASS, TYPE and RDATA components.  (Note
      that this order is different from the order used in examples and
      the order used in the actual RRs; the given order allows easier
      parsing and defaulting.)

         The owner name is derived from the location specifier.

         The TTL field is optional, and is expressed as a decimal
         number.  If omitted TTL defaults to zero.

         The CLASS field is also optional; if omitted the CLASS defaults
         to the most recent value of the CLASS field in a previous RR.

         The RDATA fields depend on the CLASS and TYPE of the RR.  In
         general, the fields that make up RDATA are expressed as decimal
         numbers or as domain names.  Some exceptions exist, and are
         documented in the RDATA definitions in Appendicies 2 and 3 of
         this memo.

      Because CLASS and TYPE fields don't contain any common
      identifiers, and because CLASS and TYPE fields are never decimal
      numbers, the parse is always unique.

      Because these files are text files several special encodings are
      necessary to allow arbitrary data to be loaded.  In particular:

         .    A free standing dot is used to refer to the current domain
              name.

         @    A free standing @ is used to denote the current origin.

noToC RFC0883 - Page 47

         ..   Two free standing dots represent the null domain name of
              the root.

         \X   where X is any character other than a digit (0-9), is used
              to quote that character so that its special meaning does
              not apply.  For example, "\." can be used to place a dot
              character in a label.

         \DDD where each D is a digit is the octet corresponding to the
              decimal number described by DDD.  The resulting octet is
              assumed to be text and is not checked for special meaning.

         ( )  Parentheses are used to group data that crosses a line
              boundary.  In effect, line terminations are not recognized
              within parentheses.

         ;    Semicolon is used to start a comment; the remainder of the
              line is ignored.

   Name server file loading example

      A name server for F.ISI.ARPA , serving as an authority for the
      ARPA and ISI.ARPA domains, might use a boot file and two master
      files.  The boot file initializes some non-authoritative data, and
      would be loaded without an origin:

    ..              9999999 IN      NS      B.ISI.ARPA               
                    9999999 CS      NS      UDEL.CSNET               
    B.ISI.ARPA      9999999 IN      A       10.3.0.52                
    UDEL.CSNET      9999999 CS      A       302-555-0000             

      This file loads non-authoritative data which provides the
      identities and addresses of root name servers.  The first line
      contains a NS RR which is loaded at the root; the second line
      starts with a blank, and is loaded at the most recent location
      specifier, in this case the root; the third and fourth lines load
      RRs at B.ISI.ARPA and UDEL.CSNET, respectively.  The timeouts are
      set to high values (9999999) to prevent this data from being
      discarded due to timeout.

      The first master file loads authoritative data for the ARPA
      domain.  This file is designed to be loaded with an origin of
      ARPA, which allows the location specifiers to omit the trailing
      .ARPA labels.

noToC RFC0883 - Page 48

    @   IN  SOA     F.ISI.ARPA       Action.E.ISI.ARPA (             
                                     20     ; SERIAL                 
                                     3600   ; REFRESH                
                                     600    ; RETRY                  
                                     3600000; EXPIRE                 
                                     60)    ; MINIMUM                
            NS      F.ISI.ARPA ; F.ISI.ARPA is a name server for ARPA
            NS      A.ISI.ARPA ; A.ISI.ARPA is a name server for ARPA
    MIT     NS      AI.MIT.ARPA; delegation to MIT name server       
    ISI     NS      F.ISI.ARPA ; delegation to ISI name server       

    UDEL    MD      UDEL.ARPA                                        
            A       10.0.0.96                                        
    NBS     MD      NBS.ARPA                                         
            A       10.0.0.19                                        
    DTI     MD      DTI.ARPA                                         
            A       10.0.0.12                                        

    AI.MIT  A       10.2.0.6                                         
    F.ISI   A       10.2.0.52                                        

      The first group of lines contains the SOA record and its
      parameters, and identifies name servers for this zone and for
      delegated zones.  The Action.E.ISI.ARPA field is a mailbox
      specification for the responsible person for the zone, and is the
      domain name encoding of the mail destination Action@E.ISI.ARPA.
      The second group specifies data for domain names within this zone.
      The last group has forward references for name server address
      resolution for  AI.MIT.ARPA and F.ISI.ARPA.  This data is not
      technically within the zone, and will only be used for additional
      record resolution for NS records used in referrals.  However, this
      data is protected by the zone timeouts in the SOA, so it will
      persist as long as the NS references persist.

      The second master file defines the ISI.ARPA environment, and is
      loaded with an origin of ISI.ARPA:

    @   IN  SOA     F.ISI.ARPA      Action\.ISI.E.ISI.ARPA (         
                                     20     ; SERIAL                 
                                     7200   ; REFRESH                
                                     600    ; RETRY                  
                                     3600000; EXPIRE                 
                                     60)    ; MINIMUM                
            NS      F.ISI.ARPA ; F.ISI.ARPA is a name server         
    A       A       10.1.0.32                                        
            MD      A.ISI.ARPA                                       
            MF      F.ISI.ARPA                                       
    B       A       10.3.0.52                                        
            MD      B.ISI.ARPA

noToC RFC0883 - Page 49

            MF      F.ISI.ARPA                                       
    F       A       10.2.0.52                                        
            MD      F.ISI.ARPA                                       
            MF      A.ISI.ARPA                                       
    $INCLUDE <SUBSYS>ISI-MAILBOXES.TXT                               

      Where the file <SUBSYS>ISI-MAILBOXES.TXT is:

    MOE     MB      F.ISI.ARPA                                       
    LARRY   MB      A.ISI.ARPA                                       
    CURLEY  MB      B.ISI.ARPA                                       
    STOOGES MB      B.ISI.ARPA                                       
            MG      MOE.ISI.ARPA                                     
            MG      LARRY.ISI.ARPA                                   
            MG      CURLEY.ISI.ARPA                                  

      Note the use of the \ character in the SOA RR to specify the
      responsible person mailbox "Action.ISI@E.ISI.ARPA".

   Name server remote zone transfer

      When a name server needs to make an initial copy of a zone or test
      to see if a existing zone copy should be refreshed, it begins by
      attempting to open a virtual circuit to the foreign name server.

      If this open attempt fails, and this was an initial load attempt,
      it schedules a retry and exits.  If this was a refresh operation,
      the name server tests the status table to see if the maximum
      holding time derived from the SOA EXPIRE field has elapsed.  If
      not, the name server schedules a retry.  If the maximum holding
      time has expired, the name server invalidates the zone in the
      status table, and scans all resource records tagged with this zone
      number.  For each record it decrements TTL fields by the length of
      time since the data was last refreshed.  If the new TTL value is
      negative, the record is deleted.  If the TTL value is still
      positive, it moves the RR to the cache tree and schedules a retry.

      If the open attempt succeeds, the name server sends a query to the
      foreign name server in which QTYPE=SOA, QCLASS is set according to
      the status table information from the configuration file, and
      QNAME is set to the domain name of the zone of interest.

      The foreign name server will return either a SOA record indicating
      that it has the zone or an error.  If an error is detected, the
      virtual circuit is closed, and the failure is treated in the same
      way as if the open attempt failed.

      If the SOA record is returned and this was a refresh, rather than
      an initial load of the zone, the name server compares the SERIAL

(next page on part 3)