4. A Reference Implementation based on GateD
In this section we report on the experience gained from implementing the pre-computation based approach of Section 2.3.1 in the GateD [Con] environment. First, we briefly introduce the GateD environment, and then present some details on how the QoS extensions were implemented in this environment. Finally, we discuss issues that arose during the implementation effort and present some measurement based results on the overhead that the QoS extensions impose on a QoS capable router and a network of QoS routers. For further details on the implementation study, the reader is referred to [AGK99]. Additional performance evaluation based on simulations can be found in [AGKT98].4.1. The Gate Daemon (GateD) Program
GateD [Con] is a popular, public domain (9) program that provides a platform for implementing routing protocols on hosts running the Unix operating system. The distribution of the GateD software also includes implementations of many popular routing protocols, including the OSPF protocol. The GateD environment offers a variety of services useful for implementing a routing protocol. These services
include a) support for creation and management of timers, b) memory management, c) a simple scheduling mechanism, d) interfaces for manipulating the host's routing table and accessing the network, and e) route management (e.g., route prioritization and route exchange between protocols). All GateD processing is done within a single Unix process, and routing protocols are implemented as one or several tasks. A GateD task is a collection of code associated with a Unix socket. The socket is used for the input and output requirements of the task. The main loop of GateD contains, among other operations, a select() call over all task sockets to determine if any read/write or error conditions occurred in any of them. GateD implements the OSPF link state database using a radix tree for fast access to individual link state records. In addition, link state records for neighboring network elements (such as adjacent routers) are linked together at the database level with pointers. GateD maintains a single routing table that contains routes discovered by all the active routing protocols. Multiple routes to the same destination are prioritized according to a set of rules and administrative preferences and only a single route is active per destination. These routes are periodically downloaded in the host's kernel forwarding table.4.2. Implementing the QoS Extensions of OSPF
4.2.1. Design Objectives and Scope
One of our major design objectives was to gain substantial experience with a functionally complete QoS routing implementation while containing the overall implementation complexity. Thus, our architecture was modular and aimed at reusing the existing OSPF code with only minimal changes. QoS extensions were localized to specific modules and their interaction with existing OSPF code was kept to a minimum. Besides reducing the development and testing effort, this approach also facilitated experimentation with different alternatives for implementing the QoS specific features such as triggering policies for link state updates and QoS route table computation. Several of the design choices were also influenced by our assumptions regarding the core functionalities that an early prototype implementation of QoS routing must demonstrate. Some of the important assumptions/requirements are: - Support for only hop-by-hop routing. This affected the path structure in the QoS routing table as it only needs to store next hop information. As mentioned earlier, the structure can be easily extended to allow construction of explicit routes.
- Support for path pre-computation. This required the creation of a separate QoS routing table and its associated path structure, and was motivated by the need to minimize processing overhead. - Full integration of the QoS extensions into the GateD framework, including configuration support, error logging, etc. This was required to ensure a fully functional implementation that could be used by others. - Ability to allow experimentation with different approaches, e.g., use of different update and pre-computation triggering policies with support for selection and parameterization of these policies from the GateD configuration file. - Decoupling from local traffic and resource management components, i.e., packet classifiers and schedulers and local call admission. This is supported by providing an API between QoS routing and the local traffic management module, which hides all internal details or mechanisms. Future implementations will be able to specify their own mechanisms for this module. - Interface to RSVP. The implementation assumes that RSVP [RZB+97] is the mechanism used to request routes with specific QoS requirements. Such requests are communicated through an interface based on [GKR97], and used the RSVP code developed at ISI, version 4.2a2 [RZB+97]. In addition, our implementation also relies on several of the simplifying assumptions made earlier in this document, namely: - The scope of QoS route computation is currently limited to a single area. - All routers within the area are assumed to run a QoS enabled version of OSPF, i.e., inter-operability with non-QoS aware versions of the OSPF protocol is not considered. - All interfaces on a router are assumed to be QoS capable.4.2.2. Architecture
The above design decisions and assumptions resulted in the architecture shown in Figure 2. It consists of three major components: the signaling component (RSVP in our case); the QoS routing component; and the traffic manager. In the rest of this section we concentrate on the structure and operation of the QoS routing component. As can be seen in Figure 2, the QoS routing extensions are further divided into the following modules:
- Update trigger module determines when to advertise local link state updates. This module implements a variety of triggering policies: periodic, threshold based triggering, and class based triggering. This module also implements a hold-down timer that enforces minimum spacing between two consecutive update triggerings from the same node. - Pre-computation trigger module determines when to perform QoS path pre-computation. So far, this module implements only periodic pre-computation triggering. - Path pre-computation module computes the QoS routing table based on the QoS specific link state information as described in Section 2.3.1. - Path selection and management module selects a path for a request with particular QoS requirements, and manages it once selected, i.e., reacts to link or reservation failures. Path selection is performed as described in Section 2.3.1. Path management functionality is not currently supported. - QoS routing table module implements the QoS specific routing table, which is maintained independently of the other GateD routing tables. - Tspec mapping module maps request requirements expressed in the form of RSVP Tspecs and Rspecs into the bandwidth requirements that QoS routing uses.4.3. Major Implementation Issues
Mapping the above design to the framework of the GateD implementation of OSPF led to a number of issues and design decisions. These issues mainly fell under two categories: a) interoperation of the QoS extensions with pre-existing similar OSPF mechanisms, and b) structure, placement, and organization of the QoS routing table. Next, we briefly discuss these issues and justify the resulting design decisions.
+--------------------------------------------------+ | +-----------------------------+ | | | QoS Route Table Computation | | | +-----------------------------+ | | | | | | V | | | +-----------------+ | | +-------------->| QoS Route Table | | | | | +-----------------+ | | | | | | | | +----------------------+ +---------------+ | | | | Core OSPF Functions | | Precomputation| | | | | + | | Trigger | | | | | (Enhanced) Topology | +---------------+ | | | | Data Base | | | | | +----------------------+ | | | | | | | | | | | +----------------------------+ | | | | | Receive and update QoS-LSA | | | | | +----------------------------+ | | | | | | | | | +----------------+ | | | | | Local Interface| | | | | | Status Monitor | | | | | +----------------+ | +----------------+ | | | | | Path Selection | | +--------------+ +----------------+ | | & Management | | | Build and | | Link State | | +----------------+ | | Send QoS-LSA |----------| Update Trigger | | | | +--------------+ +----------------+ | +----------------+ | | | | QoS Parameter | | | | | Mapping | | OSPF with QoS Routing Extensions | | |----------------+ +-------------------------------------------|------+ | | +----------------+ +----------+ | QoS Route | | Local | | Request Client |<---------------------------------------->| Resource | | (e.g. RSVP) | | Manager | +----------------+ +----------+ Figure 2: The software architecture
The ability to trigger link state updates in response to changes in bandwidth availability on interfaces is an essential component of the QoS extensions. Mechanisms for triggering these updates and controlling their rate have been mentioned in Section 2.2. In addition, OSPF implements its own mechanism for triggering link state updates as well as its own hold down timer, which may be incompatible with what is used for the QoS link state updates. We handle such potential conflicts as follows. First, since OSPF triggers updates on a periodic basis with low frequency, we expect these updates to be only a small part of the total volume of updates generated. As a result, we chose to maintain the periodic update triggering of OSPF. Resolving conflicts in the settings of the different hold down timer settings requires more care. In particular, it is important to ensure that the existing OSPF hold down timer does not interfere with QoS updates. One option is to disable the existing OSPF timer, but protection against transient overloads calls for some hold down timer, albeit with a small value. As a result, the existing OSPF hold down timer was kept, but reduced its value to 1 second. This value is low enough (actually is the lowest possible, since GateD timers have a maximum resolution of 1 second) so that it does not interfere with the generation of the QoS link state updates, which will actually often have hold down timers of their own with higher values. An additional complexity is that the triggering of QoS link state updates needs to be made aware of updates performed by OSPF itself. This is necessary, as regular OSPF updates also carry bandwidth information, and this needs to be considered by QoS updates to properly determine when to trigger a new link state update. Another existing OSPF mechanism that has the potential to interfere with the extensions needed for QoS routing, is the support for delayed acknowledgments that allows aggregation of acknowledgments for multiple LSAs. Since link state updates are maintained in retransmission queues until acknowledged, excessive delay in the generation of the acknowledgement combined with the increased rates of QoS updates may result in overflows of the retransmission queues. To avoid these potential overflows, this mechanism was bypassed altogether and LSAs received from neighboring routers were immediately acknowledged. Another approach which was considered but not implemented, was to make QoS LSAs unreliable, i.e., eliminate their acknowledgments, so as to avoid any potential interference. Making QoS LSAs unreliable would be a reasonable design choice because of their higher frequency compared to the regular LSAs and the reduced impact that the loss of a QoS LSA has on the protocol operation. Note that the loss of a QoS LSA does not interfere with the base operation of OSPF, and only transiently reduces the quality of paths discovered by QoS routing.
The structure and placement of the QoS routing table also raises some interesting implementation issues. Pre-computed paths are placed into a QoS routing table. This table is implemented as a set of path structures, one for each destination, which contain all the available paths to this destination. In order to be able to efficiently locate individual path structures, an access structure is needed. In order to minimize the develpement effort, the radix tree structure used for the regular GateD routing tables was reused. In addition, the QoS routing table was kept independent of the GateD routing tables to conform to the design goal of localizing changes and minimizing the impact on the existing OSPF code. An additional reason for maintaining the QoS routing separate and self-contained is that it is re-computed under conditions that are different from those used for the regular routing tables. Furthermore, since the QoS routing table is re-built frequently, it must be organized so that its computation is efficient. A common operation during the computation of the QoS routing table is mapping a link state database entry to the corresponding path structure. In order to make this operation efficient, the link state database entries were extended to contain a pointer to the corresponding path structure. In addition, when a new QoS routing table is to be computed, the previous one must be de-allocated. This is accomplished by traversing the radix tree in-order, and de-allocating each node in the tree. This full de-allocation of the QoS routing table is potentially wasteful, especially since memory allocation and de-allocation is an expensive operation. Furthermore, because path pre-computations are typically not triggered by changes in topology, the set of destinations will usually remain the same and correspond to an unchanged radix tree. A natural optimization would then be to de-allocate only the path structures and maintain the radix tree. A further enhancement would be to maintain the path structures as well, and attempt to incrementally update them only when required. However, despite the potential gains, these optimizations have not been included in the initial implementation. The main reason is that they involve subtle and numerous checks to ensure the integrity of the overall data structure at all times, e.g., correctly remove failed destinations from the radix tree and update the tree accordingly.
4.4. Bandwidth and Processing Overhead of QoS Routing
After completing the implementation outlined in the previous sections, it was possible to perform an experimental study of the cost and nature of the overhead of the QoS routing extensions proposed in this document. In particular, using a simple setup consisting of two interconnected routers, it is possible to measure the cost of individual QoS routing related operations. These operations are: a) computation of the QoS routing table, b) selection of a path from the QoS routing table, c) generation of a link state update, and d) reception of a link state update. Note that the last two operations are not really specific to QoS routing since regular OSPF also performs them. Nevertheless, we expect the more sensitive update triggering mechanisms required for effective QoS routing to result in increased number of updates, making the cost of processing updates an important component of the QoS routing overhead. An additional cost dimension is the memory required for storing the QoS routing table. Scaling of the above costs with increasing sizes of the topology database was investigated by artificially populating the topology databases of the routers under measurement. Table 1 shows how the measured costs depend on the size of the topology. The topology used in the measurements was built by replicating a basic building block consisting of four routers connected with transit networks in a rectangular arrangement. The details of the topology and the measurements can be found in [AGK99]. The system running the GateD software was an IBM IntelliStation Z Pro with a Pentium Pro processor at 200 MHz, 64 MBytes or real memory, running FreeBSD 2.2.5-RELEASE and GateD 4. From the results of Table 1, one can observe that the cost of path pre-computation is not much higher than that of the regular SPF computation. However, path pre- computation may need to be performed much more often than the SPF computation, and this can potentially lead to higher processing costs. This issue was investigated in a set of subsequent experiments, that are described later in this section. The other cost components reported in Table 1 include memory, and it can be seen that the QoS routing table requires roughly 80% more memory than the regular routing table. Finally, the cost of selecting a path is found to be very small compared to the path pre-computation times. As expected, all the measured quantities increase as the size of the topology increases. In particular, the storage requirements and the processing costs for both SPF computation and QoS path pre- computation scale almost linearly with the network size.
________________________________________________________________________ |Link_state_database_size_______|_25_|__49_|__81__|__121_|__169_|__225_| |Regular_SPF_time_(microsec)____|215_|_440_|_747__|_1158_|_1621_|_2187_| |Pre-computation_time_(microsec)|736_|_1622|_2883_|_4602_|_6617_|_9265_| |SPF_routing_table_size_(bytes)_|2608|_4984|_8152_|_12112|_16864|_22408| |QoS_routing_table_size_(bytes)_|3924|_7952|_13148|_19736|_27676|_36796| |Path_Selection_time_(microsec)_|_.7_|_1.6_|__2.8_|__4.6_|__6.6_|__9.2_| Table 1: Stand alone QoS routing costs In addition to the stand alone costs reported in Table 1, it is important to assess the actual operational load induced by QoS routing in the context of a large network. Since it is not practical to reproduce a large scale network in a lab setting, the approach used was to combine simulation and measurements. Specifically, a simulation was used to obtain a time stamped trace of QoS routing related events that occur in a given router in a large scale network. The trace was then used to artificially induce similar load conditions on a real router and its adjacent links. In particular, it was used to measure the processing load at the router and bandwidth usage that could be attributed to QoS updates. A more complete discussion of the measurement method and related considerations can be found in [AGK99]. The use of a simulation further allows the use of different configurations, where network topology is varied together with other QoS parameters such as a) period of pre-computation, and b) threshold for triggering link state updates. The results reported here were derived using two types of topologies. One based on a regular but artificial 8x8 mesh network, and another (isp) which has been used in several previous studies [AGKT98, AT98] and that approximates the network of a nation-wide ISP. As far as pre-computation periods are concerned, three values of 1, 5 and 50 seconds were chosen, and for the triggering of link state update thresholds of 10% and 80% were used. These values were selected as they cover a wide range in terms of precision of pre-computed paths and accuracy of the link state information available at the routers. Also note that 1 second is the smallest pre-computation period allowed by GateD. Table 2 provides results on the processing load at the router driven by the simulation trace, for the two topologies and different combinations of QoS parameters, i.e., pre-computation period and threshold for triggering link state updates. Table 3 gives the bandwidth consumption of QoS updates on the links adjacent to the router.
________________________________________________________________ |_____________________|_________Pre-computation_Period_________| |Link_state_threshold_|___1_sec____|____5_sec____|____50_sec___| |_________10%_________|.45%_(1.6%)_|__.29%_(2%)__|__.17%_(3%)__| |_________80%_________|.16%_(2.4%)_|__.04%_(3%)__|_.02%_(3.8%)_| isp ________________________________________________________________ |_________10%_________|3.37%_(2.1%)|_2.23%_(3.3%)|_1.78%_(7.7%)| |_________80%_________|1.54%_(5.4%)|_.42%_(6.6%)_|_.14%_(10.4%)| 8x8 mesh Table 2: Router processing load and (bandwidth blocking). In Table 2, processing load is expressed as the percentage of the total CPU resources that are consumed by GateD processing. The same table also shows the routing performance that is achieved for each combination of QoS parameters, so that comparison of the different processing cost/routing performance trade-offs can be made. Routing performance is measured using the bandwidth blocking ratio, defined as the sum of requested bandwidth of the requests that were rejected over the total offered bandwidth. As can be seen from Table 2, processing load is low even when the QoS routing table is recomputed every second, and LSAs are generated every time the available bandwidth on a link changes by more than 10% of the last advertised value. This seems to indicate that given today's processor technology, QoS routing should not be viewed as a costly enhancement, at least not in terms of its processing requirements. Another general observation is that while network size has obviously an impact, it does not seem to drastically affect the relative influence of the different parameters. In particular, despite the differences that exist between the isp and mesh topologies, changing the pre- computation period or the update threshold translates into essentially similar relative changes. Similar conclusions can be drawn for the update traffic shown in Table 3. In all cases, this traffic is only a small fraction of the link's capacity. Clearly, both the router load and the link bandwidth consumption depend on the router and link that was the target of the measurements and will vary for different choices. The results shown here are meant to be indicative, and a more complete discussion can be found in [AGK99].
_______________________________________ |_Link_state_threshold_|_______________| |_________10%__________|3112_bytes/sec_| |_________80%__________|177_bytes/sec__| isp ________________________________________ |_________10%__________|15438_bytes/sec_| |_________80%__________|1053_bytes/sec__| 8x8 mesh Table 3: Link state update traffic Summarizing, by carrying out the implementation of the proposed QoS routing extensions to OSPF we demonstrated that such extensions are fairly straightforward to implement. Furthermore, by measuring the performance of the real system we were able to demonstrate that the overheads associated with QoS routing are not excessive, and are definitely within the capabilities of modern processor and workstation technology.5. Security Considerations
The QoS extensions proposed in this document do not raise any security considerations that are in addition to the ones associated with regular OSPF. The security considerations of OSPF are presented in [Moy98]. However, it should be noted that this document assumes the availability of some entity responsible for assessing the legitimacy of QoS requests. For example, when the protocol used for initiating QoS requests is the RSVP protocol, this capability can be provided through the use of RSVP Policy Decision Points and Policy Enforcement Points as described in [YPG97]. Similarly, a policy server enforcing the acceptability of QoS requests by implementing decisions based on the rules and languages of [RMK+98], would also be capable of providing the desired functionality.
APPENDICES
A. Pseudocode for the BF Based Pre-Computation Algorithm
Note: The pseudocode below assumes a hop-by-hop forwarding approach in updating the neighbor field. The modifications needed to support explicit route construction are straightforward. The pseudocode also does not handle equal cost multi-paths for simplicity, but the modification needed to add this support are straightforward. Input: V = set of vertices, labeled by integers 1 to N. L = set of edges, labeled by ordered pairs (n,m) of vertex labels. s = source vertex (at which the algorithm is executed). For all edges (n,m) in L: * b(n,m) = available bandwidth (according to last received update) on interface associated with the edge between vertices n and m. * If(n,m) outgoing interface corresponding to edge (n,m) when n is a router. H = maximum hop-count (at most the graph diameter). Type: tab_entry: record bw = integer, neighbor = integer 1..N. Variables: TT[1..N, 1..H]: topology table, whose (n,h) entry is a tab_entry record, such that: TT[n,h].bw is the maximum available bandwidth (as known thus far) on a path of at most h hops between vertices s and n, TT[n,h].neighbor is the first hop on that path (a neighbor of s). It is either a router or the destination n. S_prev: list of vertices that changed a bw value in the TT table in the previous iteration. S_new: list of vertices that changed a bw value (in the TT table etc.) in the current iteration. The Algorithm: begin; for n:=1 to N do /* initialization */ begin; TT[n,0].bw := 0;
TT[n,0].neighbor := null TT[n,1].bw := 0; TT[n,1].neighbor := null end; TT[s,0].bw := infinity; reset S_prev; for all neighbors n of s do begin; TT[n,1].bw := max( TT[n,1].bw, b[s,n]); if (TT[n,1].bw = b[s,n]) then TT[n,1].neighbor := If(s,n); /* need to make sure we are picking the maximum */ /* bandwidth path for routers that can be reached */ /* through both networks and point-to-point links */ if (n is a router) then S_prev := S_prev union {n} /* only a router is added to S_prev, */ /* if it is not already included in S_prev */ else /* n is a network: */ /* proceed with network--router edges, without */ /* counting another hop */ for all (n,k) in L, k <> s, do /* i.e., for all other neighboring routers of n */ begin; TT[k,1].bw := max( min( TT[n,1].bw, b[n,k]), TT[k,1].bw ); /* In case k could be reached through another path */ /* (a point-to-point link or another network) with */ /* more bandwidth, we do not want to update TT[k,1].bw */ if (min( TT[n,1].bw, b[n,k]) = TT[k,1].bw ) /* If we have updated TT[k,1].bw by going through */ /* network n */ then TT[k,1].neighbor := If(s,n); /* neighbor is interface to network n */ if ( {k} not in S_prev) then S_prev := S_prev union {k} /* only routers are added to S_prev, but we again need */ /* to check they are not already included in S_prev */ end end; for h:=2 to H do /* consider all possible number of hops */ begin; reset S_new; for all vertices m in V do begin; TT[m,h].bw := TT[m,h-1].bw; TT[m,h].neighbor := TT[m,h-1].neighbor end;
for all vertices n in S_prev do /* as shall become evident, S_prev contains only routers */ begin; for all edges (n,m) in L do if min( TT[n,h-1].bw, b[n,m]) > TT[m,h].bw then begin; TT[m,h].bw := min( TT[n,h-1].bw, b[n,m]); TT[m,h].neighbor := TT[n,h-1].neighbor; if m is a router then S_new := S_new union {m} /* only routers are added to S_new */ else /* m is a network: */ /* proceed with network--router edges, without counting */ /* them as another hop */ for all (m,k) in L, k <> n, /* i.e., for all other neighboring routers of m */ if min( TT[m,h].bw, b[m,k]) > TT[k,h].bw then begin; /* Note: still counting it as the h-th hop, as (m,k) is */ /* a network--router edge */ TT[k,h].bw := min( TT[m,h].bw, b[m,k]); TT[k,h].neighbor := TT[m,h].neighbor; S_new := S_new union {k} /* only routers are added to S_new */ end end end; S_prev := S_new; /* the two lists can be handled by a toggle bit */ if S_prev=null then h=H+1 /* if no changes then exit */ end; end.
B. On-Demand Dijkstra Algorithm for QoS Path Computation
In the main text, we described an algorithm that allows pre- computation of QoS routes. However, it may be feasible in some instances, e.g., limited number of requests for QoS routes, to instead perform such computations on-demand, i.e., upon receipt of a request for a QoS route. The benefit of such an approach is that depending on how often recomputation of pre-computed routes is triggered, on-demand route computation can yield better routes by using the most recent link metrics available. Another benefit of on-demand path computation is the associated storage saving, i.e., there is no need for a QoS routing table. This is essentially the standard trade-off between memory and processing cycles. In this section, we briefly describe how a standard Dijkstra algorithm can, for a given destination and bandwidth requirement, generate a minimum hop path that can accommodate the required bandwidth and also has maximum bandwidth. Because the Dijkstra algorithm is already used in the current OSPF route computation, only differences from the standard algorithm are described. Also, while for simplicity we do not consider here zero-hop edges, the modification required for supporting them is straightforward. The algorithm essentially performs a minimum hop path computation, on a graph from which all edges, whose available bandwidth is less than that requested by the flow triggering the computation, have been removed. This can be performed either through a pre-processing step, or while running the algorithm by checking the available bandwidth value for any edge that is being considered (see the pseudocode that follows). Another modification to a standard Dijkstra based minimum hop count path computation, is that the list of equal cost next (previous) hops which is maintained as the algorithm proceeds, needs to be sorted according to available bandwidth. This is to allow selection of the minimum hop path with maximum available bandwidth. Alternatively, the algorithm could also be modified to, at each step, only keep among equal hop count paths the one with maximum available bandwidth. This would essentially amount to considering a cost that is function of both hop count and available bandwidth. Note: The pseudocode below assumes a hop-by-hop forwarding approach in updating the neighbor field. Addition of routes to stub networks is done in a second phase as usual. The modifications needed to support explicit route construction are straightforward. The pseudocode also does not handle equal cost multi-paths for simplicity, but the modifications needed to add this support are also easily done.
Input: V = set of vertices, labeled by integers 1 to N. L = set of edges, labeled by ordered pairs (n,m) of vertex labels. s = source vertex (at which the algorithm is executed). For all edges (n,m) in L: * b(n,m) = available bandwidth (according to last received update) on interface associated with the edge between vertices n and m. * If(n,m) = outgoing interface corresponding to edge (n,m) when n is a router. d = destination vertex. B = requested bandwidth for the flow served. Type: tab_entry: record hops = integer, neighbor = integer 1..N, ontree = boolean. Variables: TT[1..N]: topology table, whose (n) entry is a tab_entry record, such that: TT[n].bw is the available bandwidth (as known thus far) on a shortest-path between vertices s and n, TT[n].neighbor is the first hop on that path (a neighbor of s). It is either a router or the destination n. S: list of candidate vertices; v: vertex under consideration; The Algorithm: begin; for n:=1 to N do /* initialization */ begin; TT[n].hops := infinity; TT[n].neighbor := null; TT[n].ontree := FALSE; end; TT[s].hops := 0; reset S; v:= s; while v <> d do begin; TT[v].ontree := TRUE; for all edges (v,m) in L and b(v,m) >= B do begin;
if m is a router begin; if not TT[m].ontree then begin; /* bandwidth must be fulfilled since all links >= B */ if TT[m].hops > TT[v].hops + 1 then begin S := S union { m }; TT[m].hops := TT[v].hops + 1; TT[m].neighbor := v; end; end; end; else /* must be a network, iterate over all attached routers */ begin; /* each network -- router edge treated as zero hop edge */ for all (m,k) in L, k <> v, /* i.e., for all other neighboring routers of m */ if not TT[k].ontree and b(m,k) >= B then begin; if TT[k].hops > TT[v].hops then begin; S := S union { k }; TT[k].hops := TT[v].hops; TT[k].neighbor := v; end; end; end; end; /* of all edges from the vertex under consideration */ if S is empty then begin; v=d; /* which will end the algorithm */ end; else begin; v := first element of S; S := S - {v}; /* remove and store the candidate to consider */ end; end; /* from processing of the candidate list */ end.
C. Precomputation Using Dijkstra Algorithm
This appendix outlines a Dijkstra-based algorithm that allows pre- computation of QoS routes for all destinations and bandwidth values. The benefit of using a Dijkstra-based algorithm is a greater synergy with existing OSPF implementations. The solution to compute all "best" paths is to consecutively compute shortest path spanning trees starting from a complete graph and removing links with less bandwidth than the threshold used in the previous computation. This yields paths with possibly better bandwidth but of course more hops. Despite the large number of Dijkstra computations involved, several optimizations such as incremental spanning tree computation can be used and allow for efficient implementations in terms of complexity as well as storage since the structure of computed paths leans itself towards path compression [ST83]. Details including measurements and applicability studies can be found in [Prz95] and [BP95]. A variation of this theme is to trade the "accuracy" of the pre- computed paths, (i.e., the paths being generated may be of a larger hop count than needed) for the benefit of using a modified version of Dijkstra shortest path algorithm and also saving on some computations. This loss in accuracy comes from the need to rely on quantized bandwidth values, which are used when computing a minimum hop count path. In other words, the range of possible bandwidth values that can be requested by a new flow is mapped into a fixed number of quantized values, and minimum hop count paths are generated for each quantized value. For example, one could assume that bandwidth values are quantized as low, medium, and high, and minimum hop count paths are computed for each of these three values. A new flow is then assigned to the minimum hop path that can carry the smallest quantized value, i.e., low, medium, or high, larger than or equal to what it requested. We restrict our discussion here to this "quantized" version of the algorithm. Here too, we discuss the elementary case where all edges count as "hops", and note that the modification required for supporting zero- hop edges is straightforward. As with the BF algorithm, the algorithm relies on a routing table that gets built as the algorithm progresses. The structure of the routing table is as follows: The QoS routing table: - a K x Q matrix, where K is the number of vertices and Q is the number of quantized bandwidth values.
- The (n;q) entry contains information that identifies the minimum hop count path to destination n, which is capable of accommodating a bandwidth request of at least bw[q] (the qth quantized bandwidth value). It consists of two fields: * hops: the minimal number of hops on a path between the source node and destination n, which can accommodate a request of at least bw[q] units of bandwidth. * neighbor: this is the routing information associated with the minimum hop count path to destination node n, whose available bandwidth is at least bw[q]. With a hop-by-hop routing approach, the neighbor information is simply the identity of the node adjacent to the source node on that path. The algorithm operates again on a directed graph where vertices correspond to routers and transit networks. The metric associated with each edge in the graph is as before the bandwidth available on the corresponding interface, where b(n;m) is the available bandwidth on the edge between vertices n and m. The vertex corresponding to the router where the algorithm is being run is selected as the source node for the purpose of path selection, and the algorithm proceeds to compute paths to all other nodes (destinations). Starting with the highest quantization index, Q, the algorithm considers the indices consecutively, in decreasing order. For each index q, the algorithm deletes from the original network topology all links (n;m) for which b(n;m) < bw[q], and then runs on the remaining topology a Dijkstra-based minimum hop count algorithm (10) between the source node and all other nodes (vertices) in the graph. Note that as with the Dijkstra used for on-demand path computation, the elimination of links such that b(n;m) < bw[q] could also be performed while running the algorithm. After the algorithm terminates, the q-th column in the routing table is updated. This amounts to recording in the hops field the hop count value of the path that was generated by the algorithm, and by updating the neighbor field. As before, the update of the neighbor field depends on the scope of the path computation. In the case of a hop-by-hop routing decision, the neighbor field is set to the identity of the node adjacent to the source node (next hop) on the path returned by the algorithm. However, note that in order to ensure that the path with the maximal available bandwidth is always chosen among all minimum hop paths that can accommodate a given quantized bandwidth, a slightly different update mechanism of the neighbor field needs to be used in some instances. Specifically, when for a given row, i.e., destination node n, the value of the hops field in column q is found equal to the value in column q+1 (here we
assume q<Q), i.e., paths that can accommodate bw[q] and bw[q+ 1] have the same hop count, then the algorithm copies the value of the neighbor field from entry (n;q+1) into that of entry (n;q). Note: The pseudocode below assumes a hop-by-hop forwarding approach in updating the neighbor field. The modifications needed to support explicit route construction are straightforward. The pseudocode also does not handle equal cost multi-paths for simplicity, but the modification needed to add this support have been described above. Details of the post-processing step of adding stub networks are omitted. Input: V = set of vertices, labeled by integers 1 to N. L = set of edges, labeled by ordered pairs (n,m) of vertex labels. s = source vertex (at which the algorithm is executed). bw[1..Q] = array of bandwidth values to "quantize" flow requests to. For all edges (n,m) in L: * b(n,m) = available bandwidth (according to last received update) on interface associated with the edge between vertices n and m. * If(n,m) = outgoing interface corresponding to edge (n,m) when n is a router. Type: tab_entry: record hops = integer, neighbor = integer 1..N, ontree = boolean. Variables: TT[1..N, 1..Q]: topology table, whose (n,q) entry is a tab_entry record, such that: TT[n,q].bw is the available bandwidth (as known thus far) on a shortest-path between vertices s and n accommodating bandwidth b(q), TT[n,q].neighbor is the first hop on that path (a neighbor of s). It is either a router or the destination n. S: list of candidate vertices; v: vertex under consideration; q: "quantize" step The Algorithm: begin; for r:=1 to Q do begin; for n:=1 to N do /* initialization */
begin; TT[n,r].hops := infinity; TT[n,r].neighbor := null; TT[n,r].ontree := FALSE; end; TT[s,r].hops := 0; end; for r:=1 to Q do begin; S = {s}; while S not empty do begin; v := first element of S; S := S - {v}; /* remove and store the candidate to consider */ TT[v,r].ontree := TRUE; for all edges (v,m) in L and b(v,m) >= bw[r] do begin; if m is a router begin; if not TT[m,r].ontree then begin; /* bandwidth must be fulfilled since all links >= bw[r] */ if TT[m,r].hops > TT[v,r].hops + 1 then begin S := S union { m }; TT[m,r].hops := TT[v,r].hops + 1; TT[m,r].neighbor := v; end; end; end; else /* must be a network, iterate over all attached routers */ begin; for all (m,k) in L, k <> v, /* i.e., for all other neighboring routers of m */ if not TT[k,r].ontree and b(m,k) >= bw[r] then begin; if TT[k,r].hops > TT[v,r].hops + 2 then begin; S := S union { k }; TT[k,r].hops := TT[v,r].hops + 2; TT[k,r].neighbor := v; end; end; end; end; /* of all edges from the vertex under consideration */ end; /* from processing of the candidate list */ end; /* of "quantize" steps */
end.D. Explicit Routing Support
As mentioned before, the scope of the path selection process can range from simply returning the next hop on the QoS path selected for the flow, to specifying the complete path that was computed, i.e., an explicit route. Obviously, the information being returned by the path selection algorithm differs in these two cases, and constructing it imposes different requirements on the path computation algorithm and the data structures it relies on. While the presentation of the path computation algorithms focused on the hop-by-hop routing approach, the same algorithms can be applied to generate explicit routes with minor modifications. These modifications and how they facilitate constructing explicit routes are discussed next. The general approach to facilitate construction of explicit routes is to update the neighbor field differently from the way it is done for hop-by-hop routing as described in Section 2.3. Recall that in the path computation algorithms the neighbor field is updated to reflect the identity of the router adjacent to the source node on the partial path computed. This facilitates returning the next hop at the source for the specific path. In the context of explicit routing, the neighbor information is updated to reflect the identity of the previous router on the path. In general, there can be multiple equivalent paths for a given hop count. Thus, the neighbor information is stored as a list rather than single value. Associated with each neighbor, additional information is stored to facilitate load balancing among these multiple paths at the time of path selection. Specifically, we store the advertised available bandwidth on the link from the neighbor to the destination in the entry. With this change, the basic approach used to extract the complete list of vertices on a path from the neighbor information in the QoS routing table is to proceed `recursively' from the destination to the origin vertex. The path is extracted by stepping through the precomputed QoS routing table from vertex to vertex, and identifying at each step the corresponding neighbor (precursor) information. The process is described as recursive since the neighbor node identified in one step becomes the destination node for table look up in the next step. Once the source router is reached, the concatenation of all the neighbor fields that have been extracted forms the desired explicit route. This applies to algorithms of Section 2.3.1 and Appendix C. If at a particular stage there are multiple neighbor choices (due to equal cost multi-paths), one of them can be chosen at random with a probability that is weighted, for example, by the
associated bandwidth on the link from the neighbor to the (current) destination. Specifically, assume a new request to destination, say, d, and with bandwidth requirements B. The index of the destination vertex identifies the row in the QoS routing table that needs to be checked to generate a path. The row is then searched to identify a suitable path. If the Bellman-Ford algorithm of Section 2.3.1 was used, the search proceeds by increasing index (hop) count until an entry is found, say at hop count or column index of h, with a value of the bw field that is equal to or greater than B. This entry points to the initial information identifying the selected path. If the Dijkstra algorithm of Appendix C is used, the first quantized value qB such that qB >= B is first identified, and the associated column then determines the first entry in the QoS routing table that identifies the selected path. Once this first entry has been identified, reconstruction of the complete list of vertices on the path proceeds similarly, whether the table was built using the algorithm of Section 2.3.1 or Appendix C. Specifically, in both cases, the neighbor field in each entry points to the previous node on the path from the source node and with the same bandwidth capabilities as those associated with the current entry. The complete path is, therefore, reconstructed by following the pointers provided by the neighbor field of successive entries. In the case of the Bellman-Ford algorithm of Section 2.3.1, this means moving backwards in the table from column to column, using at each step the row index pointed to by the neighbor field of the entry in the previous column. Each time, the corresponding vertex index specified in the neighbor field is pre-pended to the list of vertices constructed so far. Since we start at column h, the process ends when the first column is reached, i.e., after h steps, at which point the list of vertices making up the path has been reconstructed. In the case of the Dijkstra algorithm of Appendix C, the backtracking process is similar although slightly different because of the different relation between paths and columns in the routing table, i.e., a column now corresponds to a quantized bandwidth value instead of a hop count. The backtracking now proceeds along the column corresponding to the quantized bandwidth value needed to satisfy the bandwidth requirements of the flow. At each step, the vertex index specified in the neighbor field is pre-pended to the list of vertices constructed so far, and is used to identify the next row index to move to. The process ends when an entry is reached whose neighbor field specifies the origin vertex of the flow. Note that since there are as many rows in the table as there are vertices in the graph, i.e., N, it could take up to N steps before the process terminates.
Note that the identification of the first entry in the routing table is identical to what was described for the hop-by-hop routing case. However, as described in this section, the update of the neighbor fields while constructing the QoS routing tables, is being performed differently in the explicit and hop-by-hop routing cases. Clearly, two different neighbor fields can be kept in each entry and updates to both could certainly be performed jointly, if support for both xplicit routing and hop-by-hop routing is needed.Endnotes
1. In this document we commit the abuse of notation of calling a "network" the interconnection of routers and networks through which we attempt to compute a QoS path. 2. This is true for uni-cast flows, but in the case of multi-cast flows, hop-by-hop and an explicit routing clearly have different implications. 3. Some hysteresis mechanism should be added to suppress updates when the metric value oscillates around a class boundary. 4. In this document, we use the terms node and vertex interchangeably. 5. Various hybrid methods can also be envisioned, e.g., periodic computations except if more than a given number of updates are received within a shorter interval, or periodic updates except if the change in metrics corresponding to a given update exceeds a certain threshold. Such variations are, however, not considered in this document. 6. Modifications to support explicit routing are discussed in Appendix D. 7. Note, that this does not say anything on whether to differentiate between outgoing and incoming bandwidth on a shared media network. As a matter of fact, a reasonable option is to set the incoming bandwidth (from network to router) to infinity, and only use the outgoing bandwidth value to characterize bandwidth availability on the shared network. 8. exponent in parenthesis 9. Access to some of the more recent versions of the GateD software is restricted to the GateD consortium members.
10. Note that a Breadth-First-Search (BFS) algorithm [CLR90] could also be used. It has a lower complexity, but would not allow reuse of existing code in an OSPF implementation.References
[AGK99] G. Apostolopoulos, R. Guerin, and S. Kamat. Implementation and performance meassurements of QoS routing extensions to OSPF. In Proceedings of INFOCOM'99, pages 680--688, New York, NY, March 1999. [AGKT98] G. Apostolopoulos, R. Guerin, S. Kamat, and S. K. Tripathi. QoS routing: A performance perspective. In Proceedings of ACM SIGCOMM'98, pages 17--28, Vancouver, Canada, October [Alm92] Almquist, P., "Type of Service in the Internet Protocol Suite", RFC 1349, July 1992. [AT98] G. Apostolopoulos and S. K. Tripathi. On reducing the processing cost of on-demand QoS path computation. In Proceedings of ICNP'98, pages 80--89, Austin, TX, October 1998. [BP95] J.-Y. Le Boudec and T. Przygienda. A Route Pre-Computation Algorithm for Integrated Services Networks. Journal of Network and Systems Management, 3(4), 1995. [Car79] B. Carre. Graphs and Networks. Oxford University Press, ISBN 0-19-859622-7, Oxford, UK, 1979. [CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990. [Con] Merit GateD Consortium. The Gate Daemon (GateD) project. [GJ79] M.R. Garey and D.S. Johnson. Computers and Intractability. Freeman, San Francisco, 1979. [GKH97] R. Guerin, S. Kamat, and S. Herzog. QoS Path Management with RSVP. In Proceedings of the 2nd IEEE Global Internet Mini-Conference, pages 1914-1918, Phoenix, AZ, November [GKR97] Guerin, R., Kamat, S. and E. Rosen, "An Extended RSVP Routing Interface, Work in Progress. [GLG+97] Der-Hwa G., Li, T., Guerin, R., Rosen, E. and S. Kamat, "Setting Up Reservations on Explicit Paths using RSVP", Work in Progress.
[GO99] R. Guerin and A. Orda. QoS-Based Routing in Networks with Inaccurate Information: Theory and Algorithms. IEEE/ACM Transactions on Networking, 7(3):350--364, June 1999. [GOW97] R. Guerin, A. Orda, and D. Williams. QoS Routing Mechanisms and OSPF Extensions. In Proceedings of the 2nd IEEE Global Internet Mini-Conference, pages 1903-1908, Phoenix, AZ, November 1997. [KNB98] Nichols, K., Blake, S., Baker F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [LO98] D. H. Lorenz and A. Orda. QoS Routing in Networks with Uncertain Parameters. IEEE/ACM Transactions on Networking, 6(6):768--778, December 1998. [Moy94] Moy, J., "OSPF Version 2", RFC 1583, March 1994. [Moy98] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [Prz95] A. Przygienda. Link State Routing with QoS in ATM LANs. Ph.D. Thesis Nr. 11051, Swiss Federal Institute of Technology, April 1995. [RMK+98] R. Rajan, J. C. Martin, S. Kamat, M. See, R. Chaudhury, D. Verma, G. Powers, and R. Yavatkar. Schema for differentiated services and integrated services in networks. INTERNET-DRAFT, October 1998. work in progress. [RZB+97] Braden, R., Editor, Zhang, L., Berson, S., Herzog, S. and S. Jamin, "Resource reSerVation Protocol (RSVP) Version 1, Functional Specification", RFC 2205, September 1997. [SPG97] Shenker, S., Partridge, C. and R. Guerin, "Specification of Guaranteed Quality of Service", RFC 2212, November 1997. [ST83] D.D. Sleator and R.E. Tarjan. A Data Structure for Dynamic Trees. Journal of Computer Systems, 26, 1983. [Tan89] A. Tannenbaum. Computer Networks. Addisson Wesley, 1989. [YPG97] Yavatkar, R., Pendarakis, D. and R. Guerin, "A Framework for Policy-based Admission Control", INTERNET-DRAFT, April 1999. Work in Progress.
Authors' Addresses
George Apostolopoulos IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 Phone: +1 914 784-6204 Fax: +1 914 784-6205 EMail: georgeap@watson.ibm.com Roch Guerin University Of Pennsylvania Department of Electrical Engineering, Rm 367 GRW 200 South 33rd Street Philadelphia, PA 19104--6390 Phone: +1 215-898-9351 EMail: guerin@ee.upenn.edu Sanjay Kamat Bell Laboratories Lucent Technologies Room 4C-510 101 Crawfords Corner Road Holmdel, NJ 07733 Phone: (732) 949-5936 email: sanjayk@dnrc.bell-labs.com Ariel Orda Dept. Electrical Engineering Technion - I.I.T Haifa, 32000 - ISRAEL Phone: +011 972-4-8294646 Fax: +011 972-4-8323041 EMail: ariel@ee.technion.ac.il
Tony Przygienda Siara Systems 300 Ferguson Drive Moutain View California 94043 Phone: +1 732 949-5936 Email: prz@siara.com Doug Williams IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 Phone: +1 914 784-5047 Fax: +1 914 784-6318 EMail: dougw@watson.ibm.com
Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.