Network Working Group K. Chan Request for Comments: 3317 Nortel Networks Category: Informational R. Sahita S. Hahn Intel K. McCloghrie Cisco Systems March 2003 Differentiated Services Quality of Service Policy Information Base Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved.Abstract
This document describes a Policy Information Base (PIB) for a device implementing the Differentiated Services Architecture. The provisioning classes defined here provide policy control over resources implementing the Differentiated Services Architecture. These provisioning classes can be used with other none Differentiated Services provisioning classes (defined in other PIBs) to provide for a comprehensive policy controlled mapping of service requirement to device resource capability and usage.
Table of Contents
Conventions used in this document...................................3 1. Glossary.........................................................3 2. Introduction.....................................................3 3. Relationship to the DiffServ Informal Management Model...........3 3.1. PIB Overview.................................................4 4. Structure of the PIB.............................................6 4.1. General Conventions..........................................6 4.2. DiffServ Data Paths..........................................7 4.2.1. Data Path PRC............................................7 4.3. Classifiers..................................................8 4.3.1. Classifier PRC...........................................9 4.3.2. Classifier Element PRC...................................9 4.4. Meters.......................................................9 4.4.1. Meter PRC...............................................10 4.4.2. Token-Bucket Parameter PRC..............................10 4.5. Actions.....................................................10 4.5.1. DSCP Mark Action PRC....................................11 4.6. Queueing Elements...........................................11 4.6.1. Algorithmic Dropper PRC.................................11 4.6.2. Random Dropper PRC......................................12 4.6.3. Queues and Schedulers...................................14 4.7. Specifying Device Capabilities..............................16 5. PIB Usage Example...............................................17 5.1. Data Path Example...........................................17 5.2. Classifier and Classifier Element Example...................18 5.3. Meter Example...............................................21 5.4. Action Example..............................................21 5.5. Dropper Examples............................................22 5.5.1. Tail Dropper Example....................................22 5.5.2. Single Queue Random Dropper Example.....................23 5.5.3. Multiple Queue Random Dropper Example...................23 5.6. Queue and Scheduler Example...............................26 6. Summary of the DiffServ PIB.....................................27 7. PIB Operational Overview........................................28 8. PIB Definition..................................................29 9. Acknowledgments.................................................90 10. Security Considerations........................................90 11. Intellectual Property Considerations...........................91 12. IANA Considerations............................................91 13. Normative References...........................................92 14. Authors' Addresses.............................................95 15. Full Copyright Statement.......................................96
Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].1. Glossary
PRC Provisioning Class. A type of policy data. See [POLTERM]. PRI Provisioning Instance. An instance of a PRC. See [POLTERM]. PIB Policy Information Base. The database of policy information. See [POLTERM]. PDP Policy Decision Point. See [RAP-FRAMEWORK]. PEP Policy Enforcement Point. See [RAP-FRAMEWORK]. PRID Provisioning Instance Identifier. Uniquely identifies an instance of a PRC.2. Introduction
[SPPI] describes a structure for specifying policy information that can then be transmitted to a network device for the purpose of configuring policy at that device. The model underlying this structure is one of well-defined provisioning classes and instances of these classes residing in a virtual information store called the Policy Information Base (PIB). This document specifies a set of provisioning classes specifically for configuring QoS Policy for Differentiated Services [DSARCH]. One way to provision policy is by means of the COPS protocol [COPS], with the extensions for provisioning [COPS-PR]. This protocol supports multiple clients, each of which may provision policy for a specific policy domain such as QoS. The PRCs defined in this DiffServ QoS PIB are intended for use by the COPS-PR diffServ client type. Furthermore, these PRCs are in addition to any other PIBs that may be defined for the diffServ client type in the future, as well as the PRCs defined in the Framework PIB [FR-PIB].3. Relationship to the DiffServ Informal Management Model
This PIB is designed according to the Differentiated Services Informal Management Model documented in [MODEL]. The model describes the way that ingress and egress interfaces of a 'n'-port router are modeled. It describes the configuration and management of a DiffServ interface in terms of a Traffic Conditioning Block (TCB) which contains, by definition, zero or more classifiers, meters, actions, algorithmic droppers, queues and schedulers. These elements are
arranged according to the QoS policy being expressed, and are always in that order. Traffic may be classified; classified traffic may be metered; each stream of traffic identified by a combination of classifiers and meters may have some set of actions performed on it; it may have dropping algorithms applied and it may ultimately be stored into a queue before being scheduled out to its next destination, either onto a link or to another TCB. When the treatment for a given packet must have any of those elements repeated in a way that breaks the permitted sequence {classifier, meter, action, algorithmic dropper, queue, scheduler}, this must be modeled by cascading multiple TCBs. The PIB represents this cascade by following the "Next" attributes of the various elements. They indicate what the next step in DiffServ processing will be, whether it be a classifier, meter, action, algorithmic dropper, queue, scheduler or a decision to now forward a packet. The PIB models the individual elements that make up the TCBs. The higher level concept of a TCB is not required in the parameterization or in the linking together of the individual elements, hence it is not used in the PIB itself and is only mentioned in the text for relating the PIB with the [MODEL]. The actual distinguishing of which TCB a specific element is a part of is not needed for the instrumentation of a device to support the functionalities of DiffServ, but it is useful for conceptual reasons. By not using the TCB concept, this PIB allows any grouping of elements to construct TCBs, using rules indicated by the [MODEL]. This will minimize changes to this PIB if rules in [MODEL] change. The notion of a Data Path is used in this PIB to indicate the DiffServ processing a packet may experience. This Data Path is distinguished based on the Role Combination, Capability Set, and the Direction of the flow the packet is part of. A Data Path Table Entry indicates the first of possibly multiple elements that will apply DiffServ treatment to the packet.3.1. PIB Overview
This PIB is structured based on the need to configure the sequential DiffServ treatments being applied to a packet, and the parameterization of these treatments. These two aspects of the configuration are kept separate throughout the design of the PIB, and are fulfilled using separate tables and data definitions. In addition, the PIB includes tables describing the capabilities and limitations of the device using a general extensible framework.
These tables are reported to the PDP and assist the PDP with the configuration of functional elements that can be realized by the device. This capabilities and limitations exchange allows a single or multiple devices to support many different variations of a functional datapath element. Allowing diverse methods of providing a general functional datapath element. In this PIB, the ingress and egress portions of a router are configured independently but in the same manner. The difference is distinguished by an attribute in a table describing the start of the data path. Each interface performs some or all of the following high-level functions: - Classify each packet according to some set of rules. - Determine whether the data stream the packet is part of is within or outside its metering parameters. - Perform a set of resulting actions such as counting and marking of the traffic with a Differentiated Services Code Point (DSCP) as defined in [DSFIELD]. - Apply the appropriate drop policy, either simple or complex algorithmic drop functionality. - Enqueue the traffic for output in the appropriate queue, whose scheduler may shape the traffic or simply forward it with some minimum rate or maximum latency. The PIB therefore contains the following elements: Data Path Table This describes the starting point of DiffServ data paths within a single DiffServ device. This class describes interface role combination and interface direction specific data paths. Classifier Tables A general extensible framework for specifying a group of filters. Meter Tables A general extensible framework and one example of a parameterization table - TBParam table, applicable for Simple Token Bucket Meter, Average Rate Meter, Single Rate Three Color Meter, Two Rate Three Color Meter, and Sliding Window Three Color Meter.
Action Tables A general extensible framework and example of parameterization tables for Mark action. The "multiplexer" and "null" actions described in [MODEL] are accomplished implicitly by means of the Prid structures of the other elements. Algorithmic Dropper Tables A general extensible framework for describing the dropper functional datapath element. This includes the absolute dropper and other queue measurement dependent algorithmic droppers. Queue and Scheduler Tables A general extensible framework for parameterizing queuing and scheduler systems. Notice Shaper is considered as a type of scheduler and is included here. Capabilities Tables A general extensible framework for defining the capabilities and limitations of the elements listed above. The capability tables allow intelligent configuration of the elements by a PDP.4. Structure of the PIB
4.1. General Conventions
The PIB consists of PRCs that represent functional elements in the data path (e.g., classifiers, meters, actions), and classes that specify parameters that apply to a certain type of functional element (e.g., a Token Bucket meter or a Mark action). Parameters are typically specified in a separate PRC to enable the use of parameter classes by multiple policies. Functional element PRCs use the Prid TC (defined in [SPPI]) to indicate indirection. A Prid is an object identifier that is used to specify an instance of a PRC in another table. A Prid is used to point to parameter PRC that applies to a functional element, such as which filter should be used for a classifier element. A Prid is also used to specify an instance of a functional element PRC that describes what treatment should be applied next for a packet in the data path. Note that the use of Prids to specify parameter PRCs allows the same functional element PRC to be extended with a number of different types of parameter PRC's. In addition, using Prids to indicate the next functional datapath element allows the elements to be ordered in any way.
4.2. DiffServ Data Paths
This part of the PIB provides instrumentation for connecting the DiffServ Functional Elements within a single DiffServ device. Please refer to [MODEL] for discussions on the valid sequencing and grouping of DiffServ Functional Elements. Given some basic information, e.g., the interface capability, role combination and direction, the first DiffServ Functional Element is determined. Subsequent DiffServ Functional Elements are provided by the "Next" pointer attribute of each entry of data path tables. A description of how this "Next" pointer is used in each table is provided in their respective DESCRIPTION clauses.4.2.1. Data Path PRC
The Data Path PRC provides the DiffServ treatment starting points for all packets of this DiffServ device. Each instance of this PRC specifies the interface capability, role combination and direction for the packet flow. There should be at most two entries for each instance (interface type, role combination, interface capability), one for ingress and one for egress. Each instance provides the first DiffServ Functional Element that each packet, at a specific interface (identified by the roles assigned to the interface) traveling in a specific relative direction, should experience. Notice this class is interface specific, with the use of interface type capability set and RoleCombination. To indicate explicitly that there are no DiffServ treatments for a particular interface type capability set, role combination and direction, an instance of the Data Path PRC can be created with zeroDotZero in the dsDataPathStart attribute. This situation can also be indicated implicitly by not supplying an instance of a Data Path PRC for that particular interface type capability set, role combination and direction. The explicit/implicit selection is up to the implementation. This means that the PEP should perform normal IP device processing when zeroDotZero is used in the dsDataPathStart attribute, or when the entry does not exist. Normal IP device processing will depend on the device; for example, this can be forwarding the packet. Based on implementation experience of network devices where data path functional elements are implemented in separate physical processors or application specific integrated circuits, separated by switch fabric, it seems that more complex notions of data path are required within the network device to correlate the different physically separate data path functional elements. For example, ingress processing may have determined a specific ingress flow that gets aggregated with other ingress flows at an egress data path functional element. Some of the information determined at the ingress data path functional element may need to be used by the egress data path
functional element. In numerous implementations, such information has been carried by adding it to the frame/memory block used to carry the flow within the network device; some implementers have called such information a "preamble" or a "frame descriptor". Different implementations use different formats for such information. Initially, one may think such information has implementation details within the network device that does not need to be exposed outside of the network device. But from Policy Control point of view, such information will be very useful in determining network resource usage feedback from the network device to the policy server. This is accomplished by using the Internal Label Marker and Filter PRCs defined in [FR-PIB].4.3. Classifiers
The classifier and classifier element tables determine how traffic is sorted out. They identify separable classes of traffic, by reference to appropriate filters, which may select anything from an individual micro-flow to aggregates identified by DSCP. The classification is used to send these separate streams to appropriate Meter, Action, Algorithmic Dropper, Queue and Scheduler elements. For example, to indicate a multi-stage meter, sub-classes of traffic may be sent to different meter stages: e.g., in an implementation of the Assured Forwarding (AF) PHB [AF-PHB], AF11 traffic might be sent to the first meter, AF12 traffic might be sent to the second and AF13 traffic sent to the second meter stage's out- of-profile action. The concept of a classifier is the same as described in [MODEL]. The structure of the classifier and classifier element tables, is the same as the classifier described in [MODEL]. Classifier elements have an associated precedence order solely for the purpose of resolving ambiguity between overlapping filters. A filter with higher values of precedence are compared first; the order of tests for entries of the same precedence is unimportant. A datapath may consist of more than one classifier. There may be an overlap of filter specification between filters of different classifiers. The first classifier functional datapath element encountered, as determined by the sequencing of diffserv functional datapath elements, will be used first. An important form of classifier is "everything else": the final stage of the classifier i.e., the one with the lowest precedence, must be "complete" since the result of an incomplete classifier is not necessarily deterministic - see [MODEL] section 4.1.2.
When a classifier PRC is instantiated at the PEP, it should always have at least one classifier element table entry, the "everything else" classifier element, with its filter matching all IP packets. This "everything else" classifier element should be created by the PDP as part of the classifier setup. The PDP has full control of all classifier PRIs instantiated at the PEP. The definition of the actual filter to be used by the classifier is referenced via a Prid: this enables the use of any sort of filter table that one might wish to design, standard or proprietary. No filters are defined in this PIB. However, standard filters for IP packets are defined in the Framework PIB [FR-PIB].4.3.1. Classifier PRC
Classifiers, used in various ingress and egress interfaces, are organized by the instances of the Classifier PRC. A data path entry points to a classifier entry. A classifier entry identifies a list of classifier elements. A classifier element effectively includes the filter entry, and points to a "next" classifier entry or some other data path functional element.4.3.2. Classifier Element PRC
Classifier elements point to the filters which identify various classes of traffic. The separation between the "classifier element" and the "filter" allows us to use many different kinds of filters with the same essential semantics of "an identified set of traffic". The traffic matching the filter corresponding to a classifier element is given to the "next" data path functional element identified in the classifier element. An example of a filter that may be pointed to by a Classifier Element PRI is the frwkIpFilter PRC, defined in [FR-PIB].4.4. Meters
A meter, according to [MODEL] section 5, measures the rate at which packets composing a stream of traffic pass it, compares this rate to some set of thresholds, and produces some number (two or more) of potential results. A given packet is said to "conform" to the meter if, at the time the packet is being looked at, the stream appears to be within the meter's profile. PIB syntax makes it easiest to define this as a sequence of one or more cascaded pass/fail tests, modeled here as if-then-else constructs. It is important to understand that this way of modeling does not imply anything about the implementation being "sequential": multi-rate/multi-profile meters, e.g., those designed to support [SRTCM], [TRTCM], or [TSWTCM] can still be
modeled this way even if they, of necessity, share information between the stages: the stages are introduced merely as a notational convenience in order to simplify the PIB structure.4.4.1. Meter PRC
The generic meter PRC is used as a base for all more specific forms of meter. The definition of parameters specific to the type of meter used is referenced via a pointer to an instance of a PRC containing those specifics. This enables the use of any sort of specific meter table that one might wish to design, standard or proprietary. One specific meter table is defined in this PIB module. Other meter tables may be defined in other PIB modules.4.4.2. Token-Bucket Parameter PRC
This is included as an example of a common type of meter. Entries in this class are referenced from the dsMeterSpecific attributes of meter PRC instances. The parameters are represented by a rate dsTBParamRate, a burst size dsTBParamBurstSize, and an interval dsTBparamInterval. The type of meter being parameterized is indicated by the dsTBParamType attribute. This is used to determine how the rate, burst, and rate interval parameters are used. Additional meter parameterization classes can be defined in other PIBs when necessary.4.5. Actions
Actions include "no action", "mark the traffic with a DSCP" or "specific action". Other tasks such as "shape the traffic" or "drop based on some algorithm" are handled in other functional datapath elements rather than in actions. The "multiplexer", "replicator", and "null" actions described in [MODEL] are accomplished implicitly through various combinations of the other elements. This PIB uses the Action PRC dsActionTable to organize one Action's relationship with the element(s) before and after it. It allows Actions to be cascaded to enable that multiple Actions be applied to a single traffic stream by using each entry's dsActionNext attribute. The dsActionNext attribute of the last action entry in the chain points to the next element in the TCB, if any, e.g., a Queueing element. It may also point at a next TCB. The parameters needed for the Action element will depend on the type of Action to be taken. Hence the PIB allows for specific Action Tables for the different Action types. This flexibility allows additional Actions to be specified in other PIBs and also allows for the use of proprietary Actions without impact on those defined here.
One may consider packet dropping as an Action element. Packet dropping is handled by the Algorithmic Dropper datapath functional element.4.5.1. DSCP Mark Action PRC
This Action is applied to traffic in order to mark it with a DiffServ Codepoint (DSCP) value, specified in the dsDscpMarkActTable.4.6. Queueing Elements
These include Algorithmic Droppers, Queues and Schedulers, which are all inter-related in their use of queueing techniques.4.6.1. Algorithmic Dropper PRC
Algorithmic Droppers are represented in this PIB by instances of the Algorithmic Dropper PRC. An Algorithmic Dropper is assumed to operate indiscriminately on all packets that are presented at its input; all traffic separation should be done by classifiers and meters preceding it. Algorithmic Dropper includes many types of droppers, from the simple always dropper to the more complex random dropper. This is indicated by the dsAlgDropType attribute. Algorithmic Droppers have a close relationship with queuing; each Algorithmic Dropper Table entry contains a dsAlgDropQMeasure attribute, indicating which queue's state affects the calculation of the Algorithmic Dropper. Each entry also contains a dsAlgDropNext attribute that indicates to which queue the Algorithmic Dropper sinks its traffic. Algorithmic Droppers may also contain a pointer to a specific detail of the drop algorithm, dsAlgDropSpecific. This PIB defines the detail for three drop algorithms: Tail Drop, Head Drop, and Random Drop; other algorithms are outside the scope of this PIB module, but the general framework is intended to allow for their inclusion via other PIB modules. One generally-applicable parameter of a dropper is the specification of a queue-depth threshold at which some drop action is to start. This is represented in this PIB, as a base attribute, dsAlgDropQThreshold, of the Algorithmic Dropper entry. The attribute, dsAlgDropQMeasure, specifies which queue's depth dsAlgDropQThreshold is to be compared against.
o An Always Dropper drops every packet presented to it. This type of dropper does not require any other parameter. o A Tail Dropper requires the specification of a maximum queue depth threshold: when the queue pointed at by dsAlgDropQMeasure reaches that depth threshold, dsAlgDropQThreshold, any new traffic arriving at the dropper is discarded. This algorithm uses only parameters that are part of the dsAlgDropEntry. o A Head Dropper requires the specification of a maximum queue depth threshold: when the queue pointed at by dsAlgDropQMeasure reaches that depth threshold, dsAlgDropQThreshold, traffic currently at the head of the queue is discarded. This algorithm uses only parameters that are part of the dsAlgDropEntry. o Random Droppers are recommended as a way to control congestion, in [QUEUEMGMT] and called for in the [AF-PHB]. Various implementations exist, that agree on marking or dropping just enough traffic to communicate with TCP-like protocols about congestion avoidance, but differ markedly on their specific parameters. This PIB attempts to offer a minimal set of controls for any random dropper, but expects that vendors will augment the PRC with additional controls and status in accordance with their implementation. This algorithm requires additional parameters on top of those in dsAlgDropEntry; these are discussed below. A Dropper Type of other is provided for the implementation of dropper types not defined here. When the Dropper Type is other, its full specification will need to be provided by another PRC referenced by dsAlgDropSpecific. A Dropper Type of Multiple Queue Random Dropper is also provided; please reference section 5.5.3 of this document for more details.4.6.2. Random Dropper PRC
One example of a random dropper is a RED-like dropper. An example of the representation chosen in this PIB for this element is shown in Figure 1. Random droppers often have their drop probability function described as a plot of drop probability (P) against averaged queue length (Q). (Qmin, Pmin) then defines the start of the characteristic plot. Normally Pmin=0, meaning that with average queue length below Qmin, there will be no drops. (Qmax, Pmax) defines a "knee" on the plot, after which point the drop probability become more progressive (greater slope). (Qclip, 1) defines the queue length at which all
packets will be dropped. Notice this is different from Tail Drop because this uses an averaged queue length. Although it is possible for Qclip = Qmax. In the PIB module, dsRandomDropMinThreshBytes and dsRandomDropMinThreshPkts represent Qmin. dsRandomDropMaxThreshBytes and dsRandomDropMaxThreshPkts represent Qmax. dsAlgDropQThreshold represents Qclip. dsRandomDropProbMax represents Pmax. This PIB does not represent Pmin (assumed to be zero unless otherwise represented). In addition, since message memory is finite, queues generally have some upper bound above which they are incapable of storing additional traffic. Normally this number is equal to Qclip, specified by dsAlgDropQThreshold. Each random dropper specification is associated with a queue. This allows multiple drop processes (of same or different types) to be associated with the same queue, as different PHB implementations may require. This also allows for sequences of multiple droppers if necessary. +-----------------+ +-------+ |AlgDrop | |Queue | --->| Next ---------+-+----------------->| Next -+--> | QMeasure -------+-+ | ... | | QThreshold | +-------+ | Type=randomDrop | +----------------+ | Specific -------+-->|RandomDrop | +-----------------+ | MinThreshBytes | | MaxThreshBytes | | ProbMax | | Weight | | SamplingRate | +----------------+ Figure 1: Example Use of the RandomDropTable for Random Droppers The calculation of a smoothed queue length may also have an important bearing on the behavior of the dropper: parameters may include the sampling interval or rate, and the weight of each sample. The performance may be very sensitive to the values of these parameters and a wide range of possible values may be required due to a wide range of link speeds. Most algorithms include a sample weight, represented here by dsRandomDropWeight. The availability of dsRandomDropSamplingRate as readable is important; the information provided by the Sampling Rate is essential to the configuration of dsRandomDropWeight. Having the Sampling Rate be configurable is also
helpful, because as line speed increases, the ability to have queue sampling be less frequent than packet arrival is needed. Note however that there is ongoing research on this topic, see e.g., [ACTQMGMT] and [AQMROUTER]. Additional parameters may be added in an enterprise PIB module, e.g., by using AUGMENTS on this class, to handle aspects of random drop algorithms that are not standardized here. NOTE: Deterministic Droppers can be viewed as a special case of Random Droppers with the drop probability restricted to 0 and 1. Hence Deterministic Droppers might be described by a Random Dropper with Pmin = 0, Pmax = 1, Qmin = Qmax = Qclip, the averaged queue length at which dropping occurs.4.6.3. Queues and Schedulers
The Queue PRC models simple FIFO queues, as described in [MODEL] section 7.1.1. The Scheduler PRC allows flexibility in constructing both simple and somewhat more complex queueing hierarchies from those queues. Of course, since TCBs can be cascaded multiple times on an interface, even more complex hierarchies can be constructed that way also. Queue PRC instances are pointed at by the "next" attributes of the upstream elements e.g., dsMeterSucceedNext. Note that multiple upstream elements may direct their traffic to the same Queue PRI. For example, the Assured Forwarding PHB suggests that all traffic marked AF11, AF12, or AF13 be placed in the same queue after metering, without reordering. This would be represented by having the dsMeterSucceedNext of each upstream meter point at the same Queue PRI. NOTE: Queue and Scheduler PRIs are for data path description; they both use Scheduler Parameterization Table entries for diffserv treatment parameterization. A Queue Table entry specifies the scheduler it wants service from by use of its Next pointer. Each Scheduler Table entry represents the algorithm in use for servicing the one or more queues that feed it. [MODEL] section 7.1.2 describes a scheduler with multiple inputs: this is represented in the PIB by having the scheduling parameters be associated with each input. In this way, sets of Queues can be grouped together as inputs to the same Scheduler. This class serves to represent the example scheduler described in the [MODEL]: other more complex representations might be created outside of this PIB.
Both the Queue PRC and the Scheduler PRC use instances of the Scheduler Parameterization PRC to specify diffserv treatment parameterization. Scheduler Parameter PRC instances are used to parameterize each input that feeds into a scheduler. The inputs can be a mixture of Queue PRI's and Scheduler PRI's. Scheduler Parameter PRI's can be used/reused by one or more Queue and/or Scheduler Table entries. For representing a Strict Priority scheduler, each scheduler input is assigned a priority with respect to all the other inputs feeding the same scheduler, with default values for the other parameters. A higher-priority input which contains traffic that is not being delayed for shaping will be serviced before a lower-priority input. For Weighted Scheduling methods e.g., WFQ, WRR, the "weight" of a given scheduler input is represented with a Minimum Service Rate leaky-bucket profile that provides a guaranteed minimum bandwidth to that input, if required. This is represented by a rate dsMinRateAbsolute; the classical weight is the ratio between that rate and the interface speed, or perhaps the ratio between that rate and the sum of the configured rates for classes. Alternatively, the rate may be represented by a relative value, as a fraction of the interface's current line rate, dsMinRateRelative to assist in cases where line rates are variable or where a higher-level policy might be expressed in terms of fractions of network resources. The two rate parameters are inter-related and changes in one may be reflected in the other. For weighted scheduling methods, one can say loosely, that WRR focuses on meeting bandwidth sharing, without concern for relative delay amongst the queues, where WFQ control both queue service order and amount of traffic serviced, providing meeting bandwidth sharing and relative delay ordering amongst the queues. A queue or scheduled set of queues (which is an input to a scheduler) may also be capable of acting as a non-work-conserving [MODEL] traffic shaper: this is done by defining a Maximum Service Rate leaky-bucket profile in order to limit the scheduler bandwidth available to that input. This is represented by a rate dsMaxRateAbsolute; the classical weight is the ratio between that rate and the interface speed, or perhaps the ratio between that rate and the sum of the configured rates for classes. Alternatively, the rate may, be represented by a relative value, as a fraction of the interface's current line rate, dsMaxRateRelative. There was discussion in the working group about alternative modeling approaches, such as defining a shaping action or a shaping element. We did not take this approach because shaping is in fact something a scheduler does to its inputs, (which we model as a queue with a
maximum rate or a scheduler whose output has a maximum rate) and we felt it was simpler and more elegant to simply describe it in that context. Additionally, multi-rate shaper [SHAPER] can be represented by the use of multiple dsMaxRateTable entries. Other types of priority and weighted scheduling methods can be defined using existing parameters in dsMinRateEntry. NOTE: dsSchedulerMethod uses AutonomousType syntax, with the different types of scheduling methods defined as OBJECT-IDENTITY. Future scheduling methods may be defined in other PIBs. This requires an OBJECT-IDENTITY definition, a description of how the existing objects are reused, if they are, and any new objects they require. NOTE: Hierarchical schedulers can be parameterized using this PIB by having Scheduler Table entries feeds into Scheduler Table entry.4.7. Specifying Device Capabilities
The DiffServ PIB uses the Base PRC classes frwkPrcSupportTable and frwkCompLimitsTable defined in [FR-PIB] to specify what PRC's are supported by a PEP and to specify any limitations on that support. The PIB also uses the capability PRC's frwkCapabilitySetTable and frwkIfRoleComboTable defined in [FR-PIB] to specify the device's capability sets, interface types, and role combinations. Each instance of the capability PRC frwkCapabilitySetTable contains an OID that points to an instance of a PRC that describes some capability of that interface type. The DiffServ PIB defines several of these capability PRCs, that assist the PDP with the configuration of DiffServ functional elements that can be implemented by the device. Each of these capability PRCs contains a direction attribute that specifies the direction for which the capability applies. This attribute is defined in a base capability PRC, which is extended by each specific capability PRC. Classification capabilities, which specify the information elements the device can use to classify traffic, are reported using the dsIfClassificationCaps PRC. Metering capabilities, which indicate what the device can do with out-of-profile packets, are specified using the dsIfMeteringCaps PRC. Scheduling capabilities, such as the number of inputs supported, are reported using the dsIfSchedulingCaps PRC. Algorithmic drop capabilities, such as the types of algorithms supported, are reported using the dsIfAlgDropCaps PRC. Queue capabilities, such as the maximum number of queues, are reported using the dsIfQueueCaps PRC. Maximum Rate capabilities, such as the maximum number of max rate Levels, are reported using the dsIfMaxRateCaps PRC.
Two PRC's are defined to allow specification of the element linkage capabilities of the PEP. The dsIfElmDepthCaps PRC indicates the maximum number of functional datapath elements that can be linked consecutively in a datapath. The dsIfElmLinkCaps PRC indicates what functional datapath elements may follow a specific type of element in a datapath. The capability reporting classes in the DiffServ and Framework PIB are meant to allow the PEP to indicate some general guidelines about what the device can do. They are intended to be an aid to the PDP when it constructs policy for the PEP. These classes do not necessarily allow the PEP to indicate every possible configuration that it can or cannot support. If a PEP receives a policy that it cannot implement, it must notify the PDP with a failure report. Currently [COPS-PR] error handling mechanism as specified in [COPS- PR] sections 4.4, 4.5, and 4.6 completely handles all known error cases of this PIB; hence no additional methods or PRCs need to be specified here.5. PIB Usage Example
This section provides some examples on how the different table entries of this PIB may be used together for a DiffServ Device. The usage of each individual attribute is defined within the PIB module itself. For the figures, all the PIB table entry and attribute names are assumed to have "ds" as their first common initial part of the name, with the table entry name assumed to be their second common initial part of the name. "0.0" is being used to mean zeroDotZero. And for Scheduler Method "= X" means "using the OID of diffServSchedulerX".5.1. Data Path Example
Notice Each entry of the DataPath table is used for a specific interface type handling a flow in a specific direction for a specific functional role-combination. For our example, we just define one such entry. +---------------------+ |DataPath | | CapSetName ="IfCap1"| | Roles = "A+B" | | IfDirection=Ingress | +---------+ | Start --------------+--->|Clfr | +---------------------+ | Id=Dept | +---------+ Figure 2: DataPath Usage Example
In Figure 2, we are using IfCap1 to indicate interface type with capability set 1 handling ingress flow for functional roles of "A+B". We are using classifier for departments to lead us into the Classifier Example below.5.2. Classifier and Classifier Element Example
We want to show how a multilevel classifier can be built using the classifier tables provided by this PIB. Notice we didn't go into details on the filters because they are not defined by this PIB. Continuing in the Data Path example from the previous section, lets say we want to perform the following classification functionality to do flow separation based on department and application type: if (Dept1) then take Dept1-action { if (Appl1) then take Dept1-Appl1-action. if (Appl2) then take Dept1-Appl2-action. if (Appl3) then take Dept1-Appl3-action. } if (Dept2) then take Dept2-action { if (Appl1) then take Dept2-Appl1-action. if (Appl2) then take Dept2-Appl2-action. if (Appl3) then take Dept2-Appl3-action. } if (Dept3) then take Dept3-action { if (Appl1) then take Dept3-Appl1-action. if (Appl2) then take Dept3-Appl2-action. if (Appl3) then take Dept3-Appl3-action. } The above classification logic is translated into the following PIB table entries, with two levels of classifications.
First for department:
+---------+
|Clfr |
| Id=Dept |
+---------+
+-------------+ +-----------+
|ClfrElement | +-->|Clfr |
| Id=Dept1 | | | Id=D1Appl |
| ClfrId=Dept | | +-----------+
| Preced=NA | |
| Next -------+--+ +------------+
| Specific ---+----->|Filter Dept1|
+-------------+ +------------+
+-------------+ +-----------+
|ClfrElement | +-->|Clfr |
| Id=Dept2 | | | Id=D2Appl |
| ClfrId=Dept | | +-----------+
| Preced=NA | |
| Next -------+--+ +------------+
| Specific ---+----->|Filter Dept2|
+-------------+ +------------+
+-------------+ +-----------+
|ClfrElement | +-->|Clfr |
| Id=Dept3 | | | Id=D3Appl |
| ClfrId=Dept | | +-----------+
| Preced=NA | |
| Next -------+--+ +------------+
| Specific ---+----->|Filter Dept3|
+-------------+ +------------+
Second for application: +-----------+ |Clfr | | Id=D1Appl | +-----------+ +---------------+ +--------------+ |ClfrElement | +----------------->|Meter | | Id=D1Appl1 | | | Id=D1A1Rate1 | | ClfrId=D1Appl | | | SucceedNext -+--->... | Preced=NA | | | FailNext ----+--->... | Next ---------+--+ +------------+ | Specific ----+--->... | Specific -----+---->|Filter Appl1| +--------------+ +---------------+ +------------+ +---------------+ +--------------+ |ClfrElement | +----------------->|Meter | | Id=D1Appl2 | | | Id=D1A2Rate1 | | ClfrId=D1Appl | | | SucceedNext -+--->... | Preced=NA | | | FailNext ----+--->... | Next ---------+--+ +------------+ | Specific ----+--->... | Specific -----+---->|Filter Appl2| +--------------+ +---------------+ +------------+ +---------------+ +--------------+ |ClfrElement | +----------------->|Meter | | Id=D1Appl3 | | | Id=D1A3Rate1 | | ClfrId=D1Appl | | | SucceedNext -+--->... | Preced=NA | | | FailNext ----+--->... | Next ---------+--+ +------------+ | Specific ----+--->... | Specific -----+---->|Filter Appl3| +--------------+ +---------------+ +------------+ Figure 3: Classifier Usage Example The application classifiers for department 2 and 3 will be very much like the application classifier for department 1 shown above. Notice in this example, Filters for Appl1, Appl2, and Appl3 are reusable across the application classifiers. This classifier and classifier element example assume the next differentiated services functional datapath element is Meter and leads us into the Meter Example section.
5.3. Meter Example
A single rate simple Meter may be easy to envision, hence we will do a Two Rate Three Color [TRTCM] example, using two Meter table entries and two TBParam table entries. +--------------+ +---------+ +--------------+ +----------+ |Meter | +->|Action | +->| Meter | +->|Action | | Id=D1A1Rate1 | | | Id=Green| | | Id=D1A1Rate2 | | | Id=Yellow| | SucceedNext -+-+ +---------+ | | SucceedNext -+-+ +----------+ | FailNext ----+-----------------+ | FailNext ----+--+ +-------+ | Specific -+ | | Specific -+ | +->|Action | +-----------+--+ +-----------+--+ | Id=Red| | | +-------+ | +------------+ | +------------+ +->|TBParam | +->|TBParam | | Type=TRTCM | | Type=TRTCM | | Rate | | Rate | | BurstSize | | BurstSize | | Interval | | Interval | +------------+ +------------+ Figure 4: Meter Usage Example For [TRTCM], the first level TBParam entry is used for Committed Information Rate and Committed Burst Size Token Bucket, and the second level TBParam entry is used for Peak Information Rate and Peak Burst Size Token Bucket. The other meters needed for this example will depend on the service class each classified flow uses. But their construction will be similar to the example given here. The TBParam table entries can be shared by multiple Meter table entries. In this example the differentiated services functional datapath element following Meter is Action, detailed in the following section.5.4. Action Example
Typically, Mark Action will be used; we will continue using the "Action, Id=Green" branch off the Meter example. Recall this is the D1A1Rate1 SucceedNext branch, meaning the flow belongs to Department 1 Application 1, within the committed rate and burst size limits for this flow. We would like to Mark this flow with a specific DSCP and also with a device internal label.
+-----------+ +-----------+ +--->AlgDropAF11 |Action | +----------------->|Action | | | Next -----+--+ +------------+ | Next -----+--+ +-------------+ | Specific -+---->|DscpMarkAct | | Specific -+--->|ILabelMarker | +-----------+ | Dscp=AF11 | +-----------+ | ILabel=D1A1 | +------------+ +-------------+ Figure 5: Action Usage Example This example uses the frwkILabelMarker PRC defined in [FR-PIB], showing the device internal label being used to indicate the micro flow that feeds into the aggregated AF flow. This device internal label may be used for flow accounting purposes and/or other data path treatments.5.5. Dropper Examples
The Dropper examples below will continue from the Action example above for AF11 flow. We will provide three different dropper setups, from simple to complex. The examples below may include some queuing structures; they are here only to show the relationship of the droppers to queuing and are not complete. Queuing examples are provided in later sections.5.5.1. Tail Dropper Example
The Tail Dropper is one of the simplest. For this example we just want to drop part of the flow that exceeds the queue's buffering capacity, 2 Mbytes. +--------------------+ +------+ |AlgDrop | +->|Q AF1 | | Id=AF11 | | +------+ | Type=tailDrop | | | Next --------------+-+--+ | QMeasure ----------+-+ | QThreshold=2Mbytes | | Specific=0.0 | +--------------------+ Figure 6: Tail Dropper Usage Example
5.5.2. Single Queue Random Dropper Example
The use of Random Dropper will introduce the usage of dsRandomDropEntry as in the example below. +-----------------+ +------+ |AlgDrop | +->|Q AF1 | | Id=AF11 | | +------+ | Type=randomDrop | | | Next -----------+-+--+ | QMeasure -------+-+ | QThreshold | +----------------+ | Specific -------+-->|RandomDrop | +-----------------+ | MinThreshBytes | | MinThreshPkts | | MaxThreshBytes | | MaxThreshPkts | | ProbMax | | Weight | | SamplingRate | +----------------+ Figure 7: Single Queue Random Dropper Usage Example Notice for Random Dropper, dsAlgDropQThreshold contains the maximum average queue length, Qclip, for the queue being measured as indicated by dsAlgDropQMeasure, the rest of the Random Dropper parameters are specified by dsRandomDropEntry as referenced by dsAlgDropSpecific. In this example, both dsAlgDropNext and dsAlgDropQMeasure references the same queue. This is the simple case but dsAlgDropQMeasure may reference another queue for PEP implementation supporting this feature.5.5.3. Multiple Queue Random Dropper Example
When network device implementation requires measuring multiple queues in determining the behavior of a drop algorithm, the existing PRCs defined in this PIB will be sufficient for the simple case, as indicated by this example.
+-------------+ +------+ |AlgDrop | +----------------+-------------------+->|Q_AF1 | | Id=AF11 | | | | +------+ | Type=mQDrop | | | | | Next -------+-+ +------------+ | +------------+ | | QMeasure ---+-->|MQAlgDrop | | +->|MQAlgDrop | | | QThreshold | | Id=AF11A | | | | Id=AF11B | | | Specific | | Type | | | | Type | | +-------------+ | Next ------+-+ | | Next ------+-+ | ExceedNext +---+ | ExceedNext | +------+ | QMeasure --+-+ | QMeasure --+-->|Q_AF2 | | QThreshold | | | QThreshold | +------+ | Specific + | | | Specific + | +----------+-+ | +----------+-+ | | +---+ +------+ | +------+ | | +->|Q_AF1 | | | +------+ | | | | +----------------+ | +----------------+ +->|RandomDrop | +->|RandomDrop | | MinThreshBytes | | MinThreshBytes | | MinThreshPkts | | MinThreshPkts | | MaxThreshBytes | | MaxThreshBytes | | MaxThreshPkts | | MaxThreshPkts | | ProbMax | | ProbMax | | Weight | | Weight | | SamplingRate | | SamplingRate | +----------------+ +----------------+ Figure 8: Multiple Queue Random Dropper Usage Example For this example, we have two queues, Q_AF1 and Q_AF2, sharing the same buffer resources. We want to make sure the common buffer resource is sufficient to service the AF11 traffic, and we want to measure the two queues for determining the drop algorithm for AF11 traffic feeding into Q_AF1. Notice mQDrop is used for dsAlgDropType of dsAlgDropEntry to indicate Multiple Queue Dropping Algorithm. The common shared buffer resource is indicated by the use of dsAlgDropEntry, with their attributes used as follows: - dsAlgDropType indicates the algorithm used, mQDrop. - dsAlgDropNext is used to indicate the next functional data path element to handle the flow when no drop occurs. - dsAlgDropQMeasure is used as the anchor for the list of dsMQAlgDropEntry, one for each queue being measured.
- dsAlgDropQThreshold is used to indicate the size of the shared buffer pool. - dsAlgDropSpecific can be used to reference instances of additional PRC (not defined in this PIB) if more parameters are required to describe the common shared buffer resource. For this example, there are two subsequent dsMQAlgDropEntrys, one for each queue being measured, with its attributes used as follows: - dsMQAlgDropType indicates the algorithm used, for this example, both dsMQAlgDropType uses randomDrop. - dsMQAlgDropQMeasure indicates the queue being measured. - dsMQAlgDropNext indicates the next functional data path element to handle the flow when no drop occurs. - dsMQAlgDropExceedNext is used to indicate the next queue's dsMQAlgDropEntry. With the use of zeroDotZero to indicate the last queue. - dsMQAlgDropQMeasure is used to indicate the queue being measured. For this example, Q_AF1 and Q_AF2 are the two queues used. - dsAlgDropQThreshold is used as in single queue Random Dropper. - dsAlgDropSpecific is used to reference the PRID that describes the dropper parameters as in its normal usage. For this example both dsAlgDropSpecifics reference dsRandomDropEntrys. Notice the anchoring dsAlgDropEntry and the two dsMQAlgDropEntrys all have their Next attribute pointing to Q_AF1. This indicates: - If the packet does not need to be checked with the individual queue's drop processing because of abundance of common shared buffer resources, then the packet is sent to Q_AF1. - If the packet is not dropped due to current Q_AF1 conditions, then it is sent to Q_AF1. - If the packet is not dropped due to current Q_AF2 conditions, then it is sent to Q_AF1. This example also uses two dsRandomDropEntrys for the two queues it measures. Their attribute usage is the same as if for single queue random dropper. Other more complex result combinations can be achieved by specifying a new PRC and referencing this new PRC with the dsAlgDropSpecific of the anchoring dsAlgDropEntry. A more simple usage can also be achieved when a single set of drop parameters are used for all queues being measured. This, again, can be referenced by the anchoring of dsAlgDropSpecific. These are not defined in this PIB.
5.6. Queue and Scheduler Example
The queue and scheduler example will continue from the dropper example in the previous section, concentrating in the queue and scheduler DiffServ datapath functional elements. Notice a shaper is constructed using queue and scheduler with MaxRate parameters. +------------+ +-----------------+ ---->|Q | +->|Scheduler | | Id=EF | | | Id=DiffServ | | Next ------+------------------------+ | Next=0.0 | | MinRate ---+--+ | | Method=Priority | | MaxRate -+ | | +----------+ | | MinRate=0.0 | +----------+-+ +-->|MinRate | | | MaxRate=0.0 | | | Priority | | +-----------------+ +----------+ | Absolute | | | | Relative | | | +-----------+ +----------+ | +->|MaxRate | | | Level | | | Absolute | | | Relative | | | Threshold | | +-----------+ +-------------+ | +----------+ +------------+ | ---->|Q | +-->|Scheduler | | | Id=AF1 | | | Id=AF | | | Next ----+--------------------+ | Next ------+--+ | MinRate -+-+ | | Method=WRR | | MaxRate | | +----------+ | | MinRate -+ | +----------+ +->|MinRate | | | MaxRate | | | Priority | | +----------+-+ | Absolute | | | | Relative | | +----------+ +----------+ | | +----------+ | | +------------+ ---->|Q | | +->|MinRate | | Id=AF2 | | | Priority | | Next ----+--------------------+ | Absolute | | MinRate -+-+ | | Relative | | MaxRate | | +----------+ | +------------+ +----------+ +->|MinRate | | | Priority | | | Absolute | | | Relative | | +----------+ |
+----------+ | ---->|Q | | | Id=AF3 | | | Next ----+--------------------+ | MinRate -+-+ | MaxRate | | +----------+ +----------+ +->|MinRate | | Priority | | Absolute | | Relative | +----------+ Figure 9: Queue and Scheduler Usage Example This example shows the queuing system for handling EF, AF1, AF2, and AF3 traffic. It is assumed that AF11, AF12, and AF13 traffic feeds into Queue AF1. And likewise for AF2x and AF3x traffic. The AF1, AF2, and AF3 Queues are serviced by the AF Scheduler using a Weighed Round Robin method. The AF Scheduler will service each of the queues feeding into it based on the minimum rate parameters of each queue. The AF and EF traffic are serviced by the DiffServ Scheduler using a Strict Priority method. The DiffServ Scheduler will service each of its inputs based on their priority parameter. Notice there is an upper bound to the servicing of EF traffic by the DiffServ Scheduler. This is accomplished with the use of maximum rate parameters. The DiffServ Scheduler uses both the maximum rate and priority parameters when servicing the EF Queue. The DiffServ Scheduler is the last DiffServ datapath functional element in this datapath. It uses zeroDotZero in its Next attribute.6. Summary of the DiffServ PIB
The DiffServ PIB consists of one module containing the base PRCs for setting DiffServ policy, queues, classifiers, meters, etc., and also contains capability PRC's that allow a PEP to specify its device characteristics to the PDP. This module contains two groups that are summarized in this section. DiffServ Capabilities Group This group consists of PRCs to indicate to the PDP the types of interface supported on the PEP in terms of their DiffServ capabilities and PRCs that the PDP can install in order to configure these interfaces (queues, scheduling parameters, buffer
sizes, etc.) to affect the desired policy. This group describes capabilities in terms of the types of interfaces and takes configuration in terms of interface types and role combinations [FR-PIB]; it does not deal with individual interfaces on the device. DiffServ Policy Group This group contains configurations of the functional elements that comprise the DiffServ policy that applies to an interface and the specific parameters that describe those elements. This group contains classifiers, meters, actions, droppers, queues and schedulers. This group also contains the PRC that associates the datapath elements with role combinations.7. PIB Operational Overview
This section provides an operational overview of configuring DiffServ QoS policy. After the initial PEP to PDP communication setup, using [COPS-PR] for example, the PEP will provide to the PDP the PIB Provisioning classes (PRCs), interface types, and interface type capabilities it supports. The PRCs supported by the PEP are reported to the PDP in the PRC Support Table, frwkPrcSupportTable, defined in the framework PIB [FR-PIB]. Each instance of the frwkPrcSupportTable indicates a PRC that the PEP understands and for which the PDP can send class instances as part of the policy information. The capabilities of interface types the PEP supports are described by rows in the capability set table, frwkCapabilitySetTable. Each row, or instance of this class contains a pointer to an instance of a PRC that describes the capabilities of the interface type. The capability objects may reside in the dsIfClassifierCapsTable, the dsIfMeteringCapsTable, the dsIfSchedulerCapsTable, the dsIfElmDepthCapsTable, the dsIfElmLinkCapsTable, or in a table defined in another PIB. The PDP, with knowledge of the PEP's capabilities, then provides the PEP with administrative domain and interface-type-specific policy information. Instances of the dsDataPathTable are used to specify the first element in the set of functional elements applied to an interface type. Each instance of the dsDataPathTable applies to an interface type defined by its roles and direction (ingress or egress).