RFC 3290

An Informal Management Model for Diffserv Routers

Pages: 56
Informational

Part 3 of 3 – Pages 35 to 56

RFC3290 - Page 35 prevText

8.  Traffic Conditioning Blocks (TCBs)

   The Classifier, Meter, Action, Algorithmic Dropper, Queue and
   Scheduler functional datapath elements described above can be
   combined into Traffic Conditioning Blocks (TCBs).  A TCB is an
   abstraction of a set of functional datapath elements that may be used
   to facilitate the definition of specific traffic conditioning
   functionality (e.g., it might be likened to a template which can be
   replicated multiple times for different traffic streams or different
   customers).  It has no likely physical representation in the
   implementation of the data path: it is invented purely as an
   abstraction for use by management tools.

   This model describes the configuration and management of a Diffserv
   interface in terms of a TCB that contains, by definition, zero or
   more Classifier, Meter, Action, Algorithmic Dropper, Queue and
   Scheduler elements.  These elements are arranged arbitrarily
   according to the policy being expressed, but always in the order
   here.  Traffic may be classified; classified traffic may be metered;
   each stream of traffic identified by a combination of classifiers and
   meters may have some set of actions performed on it, followed by drop

RFC3290 - Page 36

   algorithms; packets of the traffic stream may ultimately be stored
   into a queue and then be scheduled out to the next TCB or physical
   interface.  It is permissible to omit elements or include null
   elements of any type, or to concatenate multiple functional datapath
   elements of the same type.

   When the Diffserv treatment for a given packet needs to have such
   building blocks repeated, this is performed by cascading multiple
   TCBs:  an output of one TCB may drive the input of a succeeding one.
   For example, consider the case where traffic of a set of classes is
   shaped to a set of rates, but the total output rate of the group of
   classes must also be limited to a rate.  One might imagine a set of
   network news feeds, each with a certain maximum rate, and a policy
   that their aggregate may not exceed some figure.  This may be simply
   accomplished by cascading two TCBs.  The first classifies the traffic
   into its separate feeds and queues each feed separately.  The feeds
   (or a subset of them) are now fed into a second TCB, which places all
   input (these news feeds) into a single queue with a certain maximum
   rate.  In implementation, one could imagine this as the several
   literal queues, a CBQ or WFQ system with an appropriate (and complex)
   weighting scheme, or a number of other approaches.  But they would
   have the same externally measurable effect on the traffic as if they
   had been literally implemented with separate TCBs.

8.1.  TCB

   A generalized TCB might consist of the following stages:

      -  Classification stage

      -  Metering stage

      -  Action stage (involving Markers, Absolute Droppers, Counters,
         and Multiplexors)

      -  Queuing stage (involving Algorithmic Droppers, Queues, and
         Schedulers)

   where each stage may consist of a set of parallel datapaths
   consisting of pipelined elements.

   A Classifier or a Meter is typically a 1:N element, an Action,
   Algorithmic Dropper, or Queue is typically a 1:1 element and a
   Scheduler is a N:1 element.  A complete TCB should, however, result
   in a 1:1 or 1:N abstract element.  Note that the fan-in or fan-out of
   an element is not an important defining characteristic of this
   taxonomy.

RFC3290 - Page 37

8.1.1.  Building blocks for Queuing

   Some particular rules are applied to the ordering of elements within
   a Queuing stage within a TCB: elements of the same type may appear
   more than once, either in parallel or in series.  Typically, a
   queuing stage will have relatively many elements in parallel and few
   in series.  Iteration and recursion are not supported constructs (the
   elements are arranged in an acyclic graph).  The following inter-
   connections of elements are allowed:

      -  The input of a Queue may be the input of the queuing block, or
         it may be connected to the output of an Algorithmic Dropper, or
         to an output of a Scheduler.

      -  Each input of a Scheduler may be connected to the output of a
         Queue, to the output of an Algorithmic Dropper, or to the
         output of another Scheduler.

      -  The input of an Algorithmic Dropper may be the first element of
         the queuing stage, the output of another Algorithmic Dropper,
         or it may be connected to the output of a Queue (to indicate
         head-dropping).

      -  The output of the queuing block may be the output of a Queue,
         an Algorithmic Dropper, or a Scheduler.

   Note, in particular, that Schedulers may operate in series such so
   that a packet at the head of a Queue feeding the concatenated
   Schedulers is serviced only after all of the scheduling criteria are
   met.  For example, a Queue which carries EF traffic streams may be
   served first by a non-work-conserving Scheduler to shape the stream
   to a maximum rate, then by a work-conserving Scheduler to mix EF
   traffic streams with other traffic streams.  Alternatively, there
   might be a Queue and/or a dropper between the two Schedulers.

   Note also that some non-sensical scenarios (e.g., a Queue preceding
   an Algorithmic Dropper, directly feeding into another Queue), are
   prohibited.

8.2.  An Example TCB

   A SLS is presumed to have been negotiated between the customer and
   the provider which specifies the handling of the customer's traffic,
   as defined by a TCS) by the provider's network.  The agreement might
   be of the following form:

RFC3290 - Page 38

      DSCP     PHB   Profile     Treatment
      ----     ---   -------     ----------------------
      001001   EF    Profile4    Discard non-conforming.
      001100   AF11  Profile5    Shape to profile, tail-drop when full.
      001101   AF21  Profile3    Re-mark non-conforming to DSCP 001000,
                                 tail-drop when full.
      other    BE    none        Apply RED-like dropping.

   This SLS specifies that the customer may submit packets marked for
   DSCP 001001 which will get EF treatment so long as they remain
   conforming to Profile4, which will be discarded if they exceed this
   profile.  The discarded packets are counted in this example, perhaps
   for use by the provider's sales department in convincing the customer
   to buy a larger SLS.  Packets marked for DSCP 001100 will be shaped
   to Profile5 before forwarding.  Packets marked for DSCP 001101 will
   be metered to Profile3 with non-conforming packets "downgraded" by
   being re-marked with a DSCP of 001000.  It is implicit in this
   agreement that conforming packets are given the PHB originally
   indicated by the packets' DSCP field.

   Figures 6 and 7 illustrates a TCB that might be used to handle this
   SLS at an ingress interface at the customer/provider boundary.

   The Classification stage of this example consists of a single BA
   classifier.  The BA classifier is used to separate traffic based on
   the Diffserv service level requested by the customer (as indicated by
   the DSCP in each submitted packet's IP header).  We illustrate three
   DSCP filter values: A, B, and C. The 'X' in the BA classifier is a
   wildcard filter that matches every packet not otherwise matched.

   The path for DSCP 001100 proceeds directly to Dropper1 whilst the
   paths for DSCP 001001 and 001101 include a metering stage.  All other
   traffic is passed directly on to Dropper3.  There is a separate meter
   for each set of packets corresponding to classifier outputs A and C.
   Each meter uses a specific profile, as specified in the TCS, for the
   corresponding Diffserv service level.  The meters in this example
   each indicate one of two conformance levels: conforming or non-
   conforming.

   Following the Metering stage is an Action stage in some of the
   branches.  Packets submitted for DSCP 001001 (Classifier output A)
   that are deemed non-conforming by Meter1 are counted and discarded
   while packets that are conforming are passed on to Queue1.  Packets
   submitted for DSCP 001101 (Classifier output C) that are deemed non-
   conforming by Meter2 are re-marked and then both conforming and non-
   conforming packets are multiplexed together before being passed on to
   Dropper2/Queue3.

RFC3290 - Page 39

   The Algorithmic Dropping, Queuing and Scheduling stages are realized
   as follows, illustrated in figure 7.  Note that the figure does not
   show any of the implicit control linkages between elements that allow
   e.g., an Algorithmic Dropper to sense the current state of a
   succeeding Queue.

                         +-----+
                         |    A|---------------------------> to Queue1
                      +->|     |
                      |  |    B|--+  +-----+    +-----+
                      |  +-----+  |  |     |    |     |
                      |  Meter1   +->|     |--->|     |
                      |              |     |    |     |
                      |              +-----+    +-----+
                      |              Counter1   Absolute
submitted +-----+     |                         Dropper1
traffic   |    A|-----+
--------->|    B|--------------------------------------> to AlgDropper1
          |    C|-----+
          |    X|--+  |
          +-----+  |  |  +-----+                +-----+
        Classifier1|  |  |    A|--------------->|A    |
           (BA)    |  +->|     |                |     |--> to AlgDrop2
                   |     |    B|--+  +-----+ +->|B    |
                   |     +-----+  |  |     | |  +-----+
                   |     Meter2   +->|     |-+    Mux1
                   |                 |     |
                   |                 +-----+
                   |                 Marker1
                   +-----------------------------------> to AlgDropper3

     Figure 6:  An Example Traffic Conditioning Block (Part 1)

   Conforming DSCP 001001 packets from Meter1 are passed directly to
   Queue1: there is no way, with configuration of the following
   Scheduler to match the metering, for these packets to overflow the
   depth of Queue1, so there is no requirement for dropping at this
   point.  Packets marked for DSCP 001100 must be passed through a
   tail-dropper, AlgDropper1, which serves to limit the depth of the
   following queue, Queue2: packets that arrive to a full queue will be
   discarded.  This is likely to be an error case: the customer is
   obviously not sticking to its agreed profile.  Similarly, all packets
   from the original DSCP 001101 stream (some may have been re-marked by
   this stage) are passed to AlgDropper2 and Queue3.  Packets marked for
   all other DSCPs are passed to AlgDropper3 which is a RED-like
   Algorithmic Dropper: based on feedback of the current depth of
   Queue4, this dropper is supposed to discard enough packets from its
   input stream to keep the queue depth under control.

RFC3290 - Page 40

   These four Queue elements are then serviced by a Scheduler element
   Scheduler1: this must be configured to give each of its inputs an
   appropriate priority and/or bandwidth share.  Inputs A and C are
   given guarantees of bandwidth, as appropriate for the contracted
   profiles.  Input B is given a limit on the bandwidth it can use
   (i.e., a non-work-conserving discipline) in order to achieve the
   desired shaping of this stream.  Input D is given no limits or
   guarantees but a lower priority than the other queues, appropriate
   for its best-effort status.  Traffic then exits the Scheduler in a
   single orderly stream.

   The interconnections of the TCB elements illustrated in Figures 6 and
   7 can be represented textually as follows:

        TCB1:

        Classifier1:
        FilterA:             Meter1
        FilterB:             Dropper1
        FilterC:             Meter2
        Default:             Dropper3


      from Meter1                     +-----+
      ------------------------------->|     |----+
                                      |     |    |
                                      +-----+    |
                                      Queue1     |
                                                 |  +-----+
      from Classifier1 +-----+        +-----+    +->|A    |
      ---------------->|     |------->|     |------>|B    |------->
                       |     |        |     |  +--->|C    |  exiting
                       +-----+        +-----+  | +->|D    |  traffic
                       AlgDropper1    Queue2   | |  +-----+
                                               | |  Scheduler1
      from Mux1        +-----+        +-----+  | |
      ---------------->|     |------->|     |--+ |
                       |     |        |     |    |
                       +-----+        +-----+    |
                       AlgDropper2    Queue3     |
                                                 |
      from Classifier1 +-----+        +-----+    |
      ---------------->|     |------->|     |----+
                       |     |        |     |
                       +-----+        +-----+
                       AlgDropper3    Queue4

        Figure 7: An Example Traffic Conditioning Block (Part 2)

RFC3290 - Page 41

        Meter1:
        Type:                AverageRate
        Profile:             Profile4
        ConformingOutput:    Queue1
        NonConformingOutput: Counter1

        Counter1:
        Output:              AbsoluteDropper1

        Meter2:
        Type:                AverageRate
        Profile:             Profile3
        ConformingOutput:    Mux1.InputA
        NonConformingOutput: Marker1

        Marker1:
        Type:                DSCPMarker
        Mark:                001000
        Output:              Mux1.InputB

        Mux1:
        Output:              Dropper2

        AlgDropper1:
        Type:                AlgorithmicDropper
        Discipline:          Drop-on-threshold
        Trigger:             Queue2.Depth > 10kbyte
        Output:              Queue2

        AlgDropper2:
        Type:                AlgorithmicDropper
        Discipline:          Drop-on-threshold
        Trigger:             Queue3.Depth > 20kbyte
        Output:              Queue3

        AlgDropper3:
        Type:                AlgorithmicDropper
        Discipline:          RED93
        Trigger:             Internal
        Output:              Queue3
        MinThresh:           Queue3.Depth > 20 kbyte
        MaxThresh:           Queue3.Depth > 40 kbyte
           <other RED parms too>

RFC3290 - Page 42

        Queue1:
        Type:                FIFO
        Output:              Scheduler1.InputA

        Queue2:
        Type:                FIFO
        Output:              Scheduler1.InputB

        Queue3:
        Type:                FIFO
        Output:              Scheduler1.InputC

        Queue4:
        Type:                FIFO
        Output:              Scheduler1.InputD

        Scheduler1:
        Type:                Scheduler4Input
        InputA:
        MaxRateProfile:      none
        MinRateProfile:      Profile4
        Priority:            20
        InputB:
        MaxRateProfile:      Profile5
        MinRateProfile:      none
        Priority:            40
        InputC:
        MaxRateProfile:      none
        MinRateProfile:      Profile3
        Priority:            20
        InputD:
        MaxRateProfile:      none
        MinRateProfile:      none
        Priority:            10

8.3.  An Example TCB to Support Multiple Customers

   The TCB described above can be installed on an ingress interface to
   implement a provider/customer TCS if the interface is dedicated to
   the customer.  However, if a single interface is shared between
   multiple customers, then the TCB above will not suffice, since it
   does not differentiate among traffic from different customers.  Its
   classification stage uses only BA classifiers.

   The configuration is readily modified to support the case of multiple
   customers per interface, as follows.  First, a TCB is defined for
   each customer to reflect the TCS with that customer: TCB1, defined
   above is the TCB for customer 1.  Similar elements are created for

RFC3290 - Page 43

   TCB2 and for TCB3 which reflect the agreements with customers 2 and 3
   respectively.  These 3 TCBs may or may not contain similar elements
   and parameters.

   Finally, a classifier is added to the front end to separate the
   traffic from the three different customers.  This forms a new TCB,
   TCB4, which is illustrated in Figure 8.

   A representation of this multi-customer TCB might be:

      TCB4:

      Classifier4:
      Filter1:     to TCB1
      Filter2:     to TCB2
      Filter3:     to TCB3
      No Match:    AbsoluteDropper4

      AbsoluteDropper4:
      Type:                AbsoluteDropper

      TCB1:
      (as defined above)

      TCB2:
      (similar to TCB1, perhaps with different
       elements or numeric parameters)

      TCB3:
      (similar to TCB1, perhaps with different
       elements or numeric parameters)

   and the filters, based on each customer's source MAC address, could
   be defined as follows:

      Filter1:

      submitted +-----+
      traffic   |    A|--------> TCB1
      --------->|    B|--------> TCB2
                |    C|--------> TCB3
                |    X|------+   +-----+
                +-----+      +-->|     |
                Classifier4      +-----+
                                 AbsoluteDrop4

      Figure 8: An Example of a Multi-Customer TCB

RFC3290 - Page 44

      Type:        MacAddress
      SrcValue:    01-02-03-04-05-06 (source MAC address of customer 1)
      SrcMask:     FF-FF-FF-FF-FF-FF
      DestValue:   00-00-00-00-00-00
      DestMask:    00-00-00-00-00-00

      Filter2:
      (similar to Filter1 but with customer 2's source MAC address as
       SrcValue)

      Filter3:
      (similar to Filter1 but with customer 3's source MAC address as
       SrcValue)

   In this example, Classifier4 separates traffic submitted from
   different customers based on the source MAC address in submitted
   packets.  Those packets with recognized source MAC addresses are
   passed to the TCB implementing the TCS with the corresponding
   customer.  Those packets with unrecognized source MAC addresses are
   passed to a dropper.

   TCB4 has a Classifier stage and an Action element stage performing
   dropping of all unmatched traffic.

8.4.  TCBs Supporting Microflow-based Services

   The TCB illustrated above describes a configuration that might be
   suitable for enforcing a SLS at a router's ingress.  It assumes that
   the customer marks its own traffic for the appropriate service level.
   It then limits the rate of aggregate traffic submitted at each
   service level, thereby protecting the resources of the Diffserv
   network.  It does not provide any isolation between the customer's
   individual microflows.

   A more complex example might be a TCB configuration that offers
   additional functionality to the customer.  It recognizes individual
   customer microflows and marks each one independently.  It also
   isolates the customer's individual microflows from each other in
   order to prevent a single microflow from seizing an unfair share of
   the resources available to the customer at a certain service level.
   This is illustrated in Figure 9.

   Suppose that the customer has an SLS which specifies 2 service
   levels, to be identified to the provider by DSCP A and DSCP B.
   Traffic is first directed to a MF classifier which classifies traffic
   based on miscellaneous classification criteria, to a granularity
   sufficient to identify individual customer microflows.  Each
   microflow can then be marked for a specific DSCP The metering

RFC3290 - Page 45

   elements limit the contribution of each of the customer's microflows
   to the service level for which it was marked.  Packets exceeding the
   allowable limit for the microflow are dropped.

                     +-----+   +-----+
    Classifier1      |     |   |     |---------------+
        (MF)      +->|     |-->|     |     +-----+   |
      +-----+     |  |     |   |     |---->|     |   |
      |    A|------  +-----+   +-----+     +-----+   |
   -->|    B|-----+  Marker1   Meter1      Absolute  |
      |    C|---+ |                        Dropper1  |   +-----+
      |    X|-+ | |  +-----+   +-----+               +-->|A    |
      +-----+ | | |  |     |   |     |------------------>|B    |--->
              | | +->|     |-->|     |     +-----+   +-->|C    | to TCB2
              | |    |     |   |     |---->|     |   |   +-----+
              | |    +-----+   +-----+     +-----+   |    Mux1
              | |    Marker2   Meter2      Absolute  |
              | |                          Dropper2  |
              | |    +-----+   +-----+               |
              | |    |     |   |     |---------------+
              | |--->|     |-->|     |     +-----+
              |      |     |   |     |---->|     |
              |      +-----+   +-----+     +-----+
              |      Marker3   Meter3      Absolute
              |                            Dropper3
              V etc.

      Figure 9: An Example of a Marking and Traffic Isolation TCB

   This TCB could be formally specified as follows:

      TCB1:
      Classifier1: (MF)
      FilterA:             Marker1
      FilterB:             Marker2
      FilterC:             Marker3
      etc.

      Marker1:
      Output:              Meter1

      Marker2:
      Output:              Meter2

      Marker3:
      Output:              Meter3

RFC3290 - Page 46

      Meter1:
      ConformingOutput:    Mux1.InputA
      NonConformingOutput: AbsoluteDropper1

      Meter2:
      ConformingOutput:    Mux1.InputB
      NonConformingOutput: AbsoluteDropper2

      Meter3:
      ConformingOutput:    Mux1.InputC
      NonConformingOutput: AbsoluteDropper3

      etc.

      Mux1:
      Output:              to TCB2

   Note that the detailed traffic element declarations are not shown
   here.  Traffic is either dropped by TCB1 or emerges marked for one of
   two DSCPs.  This traffic is then passed to TCB2 which is illustrated
   in Figure 10.

   TCB2 could then be specified as follows:

      Classifier2: (BA)
      FilterA:               Meter5
      FilterB:               Meter6


                     +-----+
                     |     |---------------> to Queue1
                  +->|     |     +-----+
        +-----+   |  |     |---->|     |
        |    A|---+  +-----+     +-----+
      ->|     |       Meter5     AbsoluteDropper4
        |    B|---+  +-----+
        +-----+   |  |     |---------------> to Queue2
      Classifier2 +->|     |     +-----+
         (BA)        |     |---->|     |
                     +-----+     +-----+
                      Meter6     AbsoluteDropper5

      Figure 10: Additional Example: TCB2

      Meter5:
      ConformingOutput:      Queue1
      NonConformingOutput:   AbsoluteDropper4

RFC3290 - Page 47

      Meter6:
      ConformingOutput:      Queue2
      NonConformingOutput:   AbsoluteDropper5

8.5.  Cascaded TCBs

   Nothing in this model prevents more complex scenarios in which one
   microflow TCB precedes another (e.g., for TCBs implementing separate
   TCSs for the source and for a set of destinations).

9.  Security Considerations

   Security vulnerabilities of Diffserv network operation are discussed
   in [DSARCH].  This document describes an abstract functional model of
   Diffserv router elements.  Certain denial-of-service attacks such as
   those resulting from resource starvation may be mitigated by
   appropriate configuration of these router elements; for example, by
   rate limiting certain traffic streams or by authenticating traffic
   marked for higher quality-of-service.

   There may be theft-of-service scenarios where a malicious host can
   exploit a loose token bucket policer to obtain slightly better QoS
   than that committed in the TCS.

10.  Acknowledgments

   Concepts, terminology, and text have been borrowed liberally from
   [POLTERM], as well as from other IETF work on MIBs and policy-
   management.  We wish to thank the authors of some of those documents:
   Fred Baker, Michael Fine, Keith McCloghrie, John Seligson, Kwok Chan,
   Scott Hahn, and Andrea Westerinen for their contributions.

   This document has benefited from the comments and suggestions of
   several participants of the Diffserv working group, particularly
   Shahram Davari, John Strassner, and Walter Weiss.  This document
   could never have reached this level of rough consensus without the
   relentless pressure of the co-chairs Brian Carpenter and Kathie
   Nichols, for which the authors are grateful.

11.  References

   [AF-PHB]    Heinanen, J., Baker, F., Weiss, W. and J. Wroclawski,
               "Assured Forwarding PHB Group", RFC 2597, June 1999.

   [DSARCH]    Carlson, M., Weiss, W., Blake, S., Wang, Z., Black, D.
               and E. Davies, "An Architecture for Differentiated
               Services", RFC 2475, December 1998.

RFC3290 - Page 48

   [DSFIELD]   Nichols, K., Blake, S., Baker, F. and D. Black,
               "Definition of the Differentiated Services Field (DS
               Field) in the IPv4 and IPv6 Headers", RFC 2474, December
               1998.

   [DSMIB]     Baker, F., Smith, A., and K. Chan, "Management
               Information Base for the Differentiated Services
               Architecture", RFC 3289, May 2002.

   [E2E]       Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
               Speer, M., Nichols, K., Braden, R., Davie, B.,
               Wroclawski, J. and E. Felstaine, "A Framework for
               Integrated Services Operation over Diffserv Networks",
               RFC 2998, November 2000.

   [EF-PHB]    Davie, B., Charny, A., Bennett, J.C.R., Benson, K., Le
               Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V. and D.
               Stiliadis, "An Expedited Forwarding PHB (Per-Hop
               Behavior)", RFC 3246, March 2002.

   [FJ95]      Floyd, S. and V. Jacobson, "Link Sharing and Resource
               Management Models for Packet Networks", IEEE/ACM
               Transactions on Networking, Vol. 3 No. 4, August 1995l.

   [INTSERV]   Braden, R., Clark, D. and S. Shenker, "Integrated
               Services in the Internet Architecture: an Overview", RFC
               1633, June 1994.

   [NEWTERMS]  Grossman, D., "New Terminology and Clarifications for
               Diffserv", RFC 3260, April, 2002

   [PDBDEF]    K. Nichols and B. Carpenter, "Definition of
               Differentiated Services Per Domain Behaviors and Rules
               for Their Specification", RFC 3086, April 2001.

   [POLTERM]   Westerinen, A., Schnizlein, J., Strassner, J., Scherling,
               M., Quinn, B., Herzog, S., Huynh, A., Carlson, M., Perry,
               J. and S. Waldbusser, "Policy Terminology", RFC 3198,
               November 2001.

   [QOSDEVMOD] Strassner, J., Westerinen, A. and B. Moore, "Information
               Model for Describing Network Device QoS Mechanisms", Work
               in Progress.

RFC3290 - Page 49

   [QUEUEMGMT] Braden, R., Clark, D., Crowcroft, J., Davie, B., Deering,
               S., Estrin, D., Floyd, S., Jacobson, V., Minshall, C.,
               Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
               S., Wroclawski, J. and L. Zhang, "Recommendations on
               Queue Management and Congestion Avoidance in the
               Internet", RFC 2309, April 1998.

   [SRTCM]     Heinanen, J. and R. Guerin, "A Single Rate Three Color
               Marker", RFC 2697, September 1999.

   [TRTCM]     Heinanen, J. and R. Guerin, "A Two Rate Three Color
               Marker", RFC 2698, September 1999.

   [VIC]       McCanne, S. and Jacobson, V., "vic: A Flexible Framework
               for Packet Video", ACM Multimedia '95, November 1995, San
               Francisco, CA, pp. 511-522.
               <ftp://ftp.ee.lbl.gov/papers/vic-mm95.ps.Z>

   [802.1D]   "Information technology - Telecommunications and
               information exchange between systems - Local and
               metropolitan area networks - Common specifications - Part
               3: Media Access Control (MAC) Bridges:  Revision.  This
               is a revision of ISO/IEC 10038: 1993, 802.1j-1992 and
               802.6k-1992.  It incorporates P802.11c, P802.1p and
               P802.12e.", ISO/IEC 15802-3: 1998.

RFC3290 - Page 50

Appendix A. Discussion of Token Buckets and Leaky Buckets

   "Leaky bucket" and/or "Token Bucket" models are used to describe rate
   control in several architectures, including Frame Relay, ATM,
   Integrated Services and Differentiated Services.  Both of these
   models are, by definition, theoretical relationships between some
   defined burst size, B, a rate, R, and a time interval, t:

                  R = B/t

   Thus, a token bucket or leaky bucket might specify an information
   rate of 1.2 Mbps with a burst size of 1500 bytes.  In this case, the
   token rate is 1,200,000 bits per second, the token burst is 12,000
   bits and the token interval is 10 milliseconds.  The specification
   says that conforming traffic will, in the worst case, come in 100
   bursts per second of 1500 bytes each and at an average rate not
   exceeding 1.2 Mbps.

A.1 Leaky Buckets

   A leaky bucket algorithm is primarily used for shaping traffic as it
   leaves an interface onto the network (handled under Queues and
   Schedulers in this model).  Traffic theoretically departs from an
   interface at a rate of one bit every so many time units (in the
   example, one bit every 0.83 microseconds) but, in fact, departs in
   multi-bit units (packets) at a rate approximating the theoretical, as
   measured over a longer interval.  In the example, it might send one
   1500 byte packet every 10 ms or perhaps one 500 byte packet every 3.3
   ms.  It is also possible to build multi-rate leaky buckets in which
   traffic departs from the interface at varying rates depending on
   recent activity or inactivity.

   Implementations generally seek as constant a transmission rate as
   achievable.  In theory, a 10 Mbps shaped transmission stream from an
   algorithmic implementation and a stream which is running at 10 Mbps
   because its bottleneck link has been a 10 Mbps Ethernet link should
   be indistinguishable.  Depending on configuration, the approximation
   to theoretical smoothness may vary by moving as much as an MTU from
   one token interval to another.  Traffic may also be jostled by other
   traffic competing for the same transmission resources.

A.2 Token Buckets

   A token bucket, on the other hand, measures the arrival rate of
   traffic from another device.  This traffic may originally have been
   shaped using a leaky bucket shaper or its equivalent.  The token
   bucket determines whether the traffic (still) conforms to the
   specification.  Multi-rate token buckets (e.g., token buckets with

RFC3290 - Page 51

   both a peak rate and a mean rate, and sometimes more) are commonly
   used, such as those described in [SRTCM] and [TRTCM].  In this case,
   absolute smoothness is not expected, but conformance to one or more
   of the specified rates is.

   Simplistically, a data stream is said to conform to a simple token
   bucket parameterized by a {R, B} if the system receives in any time
   interval, t, at most, an amount of data not exceeding (R * t) + B.

   For a multi-rate token bucket case, the data stream is said to
   conform if, for each of the rates, the stream conforms to the token-
   bucket profile appropriate for traffic of that class.  For example,
   received traffic that arrives pre-classified as one of the "excess"
   rates (e.g., AF12 or AF13 traffic for a device implementing the AF1x
   PHB) is only compared to the relevant "excess" token bucket profile.

A.3 Some Consequences

   The fact that Internet Protocol data is organized into variable
   length packets introduces some uncertainty in the conformance
   decision made by any downstream Meter that is attempting to determine
   conformance to a traffic profile that is theoretically designed for
   fixed-length units of data.

   When used as a leaky bucket shaper, the above definition interacts
   with clock granularity in ways one might not expect.  A leaky bucket
   releases a packet only when all of its bits would have been allowed:
   it does not borrow from future capacity.  If the clock is very fine
   grain, on the order of the bit rate or faster, this is not an issue.
   But if the clock is relatively slow (and millisecond or multi-
   millisecond clocks are not unusual in networking equipment), this can
   introduce jitter to the shaped stream.

   This leaves an implementor of a token bucket Meter with a dilemma.
   When the number of bandwidth tokens, b, left in the token bucket is
   positive but less than the size of the packet being operated on, L,
   one of three actions can be performed:

      (1)   The whole size of the packet can be subtracted from the
            bucket, leaving it negative, remembering that, when new
            tokens are next added to the bucket, the new token
            allocation, B, must be added to b rather than simply setting
            the bucket to "full".  This option potentially puts more
            than the desired burst size of data into this token bucket
            interval and correspondingly less into the next.  It does,
            however, keep the average amount accepted per token bucket
            interval equal to the token burst.  This approach accepts
            traffic if any one bit in the packet would have been

RFC3290 - Page 52

            accepted and borrows up to one MTU of capacity from one or
            more subsequent intervals when necessary.  Such a token
            bucket meter implementation is said to offer "loose"
            conformance to the token bucket.

      (2)   Alternatively, the packet can be rejected and the amount of
            tokens in the bucket left unchanged (and maybe an attempt
            could be made to accept the packet under another threshold
            in another bucket), remembering that, when new tokens are
            next added to the bucket, the new token allocation, B, must
            be added to b rather than simply setting the bucket to
            "full".  This potentially puts less than the permissible
            burst size of data into this token bucket interval and
            correspondingly more into the next.  Like the first option,
            it keeps the average amount accepted per token bucket
            interval equal to the token burst.  This approach accepts
            traffic only if every bit in the packet would have been
            accepted and borrows up to one MTU of capacity from one or
            more previous intervals when necessary.  Such a token bucket
            meter implementation is said to offer "strict" (or perhaps
            "stricter") conformance to the token bucket.  This option is
            consistent with [SRTCM] and [TRTCM] and is often used in ATM
            and frame-relay implementations.

      (3)   The TB variable can be set to zero to account for the first
            part of the packet and the remainder of the packet size can
            be taken out of the next-colored bucket.  This, of course,
            has another bug:  the same packet cannot have both
            conforming and non-conforming components in the Diffserv
            architecture and so is not really appropriate here and we do
            not discuss this option further here.

            Unfortunately, the thing that cannot be done is exactly to
            fit the token burst specification with random sized packets:
            therefore token buckets in a variable length packet
            environment always have a some variance from theoretical
            reality.  This has also been observed in the ATM Guaranteed
            Frame Rate (GFR) service category specification and Frame
            Relay.  A number of observations may be made:

   o  Operationally, a token bucket meter is reasonable for traffic
      which has been shaped by a leaky bucket shaper or a serial line.
      However, traffic in the Internet is rarely shaped in that way: TCP
      applies no shaping to its traffic, but rather depends on longer-
      range ACK-clocking behavior to help it approximate a certain rate
      and explicitly sends traffic bursts during slow start,
      retransmission, and fast recovery.  Video-on-IP implementations
      such as [VIC] may have a leaky bucket shaper available to them,

RFC3290 - Page 53

      but often do not, and simply enqueue the output of their codec for
      transmission on the appropriate interface.  As a result, in each
      of these cases, a token bucket meter may reject traffic in the
      short term (over a single token interval) which it would have
      accepted if it had a longer time in view and which it needs to
      accept for the application to work properly.  To work around this,
      the token interval, B/R, must approximate or exceed the RTT of the
      session(s) in question and the burst size, B, must accommodate the
      largest burst that the originator might send.

   o  The behavior of a loose token bucket is significantly different
      from the token bucket description for ATM and for Frame Relay.

   o  A loose token bucket does not accept packets while the token count
      is negative.  This means that, when a large packet has just
      borrowed tokens from the future, even a small incoming packet
      (e.g., a 40-byte TCP ACK/SYN) will not be accepted.  Therefore, if
      such a loose token bucket is configured with a burst size close to
      the MTU, some discrimination against smaller packets can take
      place: use of a larger burst size avoids this problem.

   o  The converse of the above is that a strict token bucket sometimes
      does not accept large packets when a loose one would do so.
      Therefore, if such a strict token bucket is configured with a
      burst size close to the MTU, some discrimination against larger
      packets can take place: use of a larger burst size avoids this
      problem.

   o  In real-world deployments, MTUs are often larger than the burst
      size offered by a link-layer network service provider.  If so then
      it is possible that a strict token bucket meter would find that
      traffic never matches the specified profile: this may be avoided
      by not allowing such a specification to be used.  This situation
      cannot arise with a loose token bucket since the smallest burst
      size that can be configured is 1 bit, by definition limiting a
      loose token bucket to having a burst size of greater than one MTU.

   o  Both strict token bucket specifications, as specified in [SRTCM]
      and [TRTCM], and loose ones, are subject to a persistent under-
      run.  These accumulate burst capacity over time, up to the maximum
      burst size.  Suppose that the maximum burst size is exactly the
      size of the packets being sent - which one might call the
      "strictest" token bucket implementation.  In such a case, when one
      packet has been accepted, the token depth becomes zero and starts
      to accumulate again.  If the next packet is received any time
      earlier than a token interval later, it will not be accepted.  If
      the next packet arrives exactly on time, it will be accepted and
      the token depth again set to zero.  If it arrives later, however,

RFC3290 - Page 54

      accumulation of tokens will have stopped because it is capped by
      the maximum burst size: during the interval between the bucket
      becoming full and the actual arrival of the packet, no new tokens
      are added.  As a result, jitter that accumulates across multiple
      hops in the network conspires against the algorithm to reduce the
      actual acceptance rate.  Thus it usually makes sense to set the
      maximum token bucket size somewhat greater than the MTU in order
      to absorb some of the jitter and allow a practical acceptance rate
      more in line with the desired theoretical rate.

A.4 Mathematical Definition of Strict Token Bucket Conformance

   The strict token bucket conformance behavior defined in [SRTCM] and
   [TRTCM] is not mandatory for compliance with any current Diffserv
   standards, but we give here a mathematical definition of two-
   parameter token bucket operation which is consistent with those
   documents and which can also be used to define a shaping profile.

   Define a token bucket with bucket size B, token accumulation rate R
   and instantaneous token occupancy b(t).  Assume that b(0) = B.  Then
   after an arbitrary interval with no packet arrivals, b(t) will not
   change since the bucket is already full of tokens.

   Assume a packet of size L bytes arrives at time t'.  The bucket
   occupancy is still B.  Then, as long as L <= B, the packet conforms
   to the meter, and afterwards

                  b(t') = B - L.

   Assume now an interval delta_t = t - t' elapses before the next
   packet arrives, of size L' <= B.  Just before this, at time t-, the
   bucket has accumulated delta_t*R tokens over the interval, up to a
   maximum of B tokens so that:

                  b(t-) = min{ B, b(t') + delta_t*R }

   For a strict token bucket, the conformance test is as follows:

      if (b(t-) - L' >= 0) {
          /* the packet conforms */
          b(t) = b(t-) - L';
      }
      else {
          /* the packet does not conform */
          b(t) = b(t-);
      }

RFC3290 - Page 55

   This function can also be used to define a shaping profile.  If a
   packet of size L arrives at time t, it will be eligible for
   transmission at time te given as follows (we still assume L <= B):

                  te = max{ t, t" }

   where t" = (L - b(t') + t'*R) / R and b(t") = L, the time when L
   credits have accumulated in the bucket, and when the packet would
   conform if the token bucket were a meter. te != t" only if t > t".

   A mathematical definition along these lines for loose token bucket
   conformance is left as an exercise for the reader.

Authors' Addresses

   Yoram Bernet
   Microsoft
   One Microsoft Way
   Redmond, WA  98052

   Phone:  +1 425 936 9568
   EMail: ybernet@msn.com

   Steven Blake
   Ericsson
   920 Main Campus Drive, Suite 500
   Raleigh, NC  27606

   Phone:  +1 919 472 9913
   EMail: steven.blake@ericsson.com

   Daniel Grossman
   Motorola Inc.
   20 Cabot Blvd.
   Mansfield, MA  02048

   Phone:  +1 508 261 5312
   EMail: dan@dma.isg.mot.com

   Andrew Smith (editor)
   Harbour Networks
   Jiuling Building
   21 North Xisanhuan Ave.
   Beijing, 100089
   PRC

   Fax: +1 415 345 1827
   EMail: ah_smith@acm.org

RFC3290 - Page 56

Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.