Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 2975

Introduction to Accounting Management

Pages: 54
Informational
Part 1 of 2 – Pages 1 to 31
None   None   Next

Top   ToC   RFC2975 - Page 1
Network Working Group                                          B. Aboba
Request for Comments: 2975                        Microsoft Corporation
Category: Informational                                        J. Arkko
                                                               Ericsson
                                                          D. Harrington
                                                 Cabletron Systems Inc.
                                                           October 2000


                 Introduction to Accounting Management

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

Abstract

The field of Accounting Management is concerned with the collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, and billing. This document describes each of these problems, and discusses the issues involved in design of modern accounting systems. Since accounting applications do not have uniform security and reliability requirements, it is not possible to devise a single accounting protocol and set of security services that will meet all needs. Thus the goal of accounting management is to provide a set of tools that can be used to meet the requirements of each application. This document describes the currently available tools as well as the state of the art in accounting protocol design. A companion document, RFC 2924, reviews the state of the art in accounting attributes and record formats.
Top   ToC   RFC2975 - Page 2

Table of Contents

1. Introduction 2 1.1 Requirements language 3 1.2 Terminology 3 1.3 Accounting management architecture 5 1.4 Accounting management objectives 7 1.5 Intra-domain and inter-domain accounting 10 1.6 Accounting record production 11 1.7 Requirements summary 13 2. Scaling and reliability 14 2.1 Fault resilience 14 2.2 Resource consumption 23 2.3 Data collection models 26 3. Review of Accounting Protocols 32 3.1 RADIUS 32 3.2 TACACS+ 33 3.3 SNMP 33 4. Review of Accounting Data Transfer 43 4.1 SMTP 44 4.2 Other protocols 44 5. Summary 45 6. Security Considerations 48 7. Acknowledgments 48 8. References 48 9. Authors' Addresses 52 10. Intellectual Property Statement 53 11. Full Copyright Statement 54

1. Introduction

The field of Accounting Management is concerned with the collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, and billing. This document describes each of these problems, and discusses the issues involved in design of modern accounting systems. Since accounting applications do not have uniform security and reliability requirements, it is not possible to devise a single accounting protocol and set of security services that will meet all needs. Thus the goal of accounting management is to provide a set of tools that can be used to meet the requirements of each application. This document describes the currently available tools as well as the state of the art in accounting protocol design. A companion document, RFC 2924, reviews the state of the art in accounting attributes and record formats.
Top   ToC   RFC2975 - Page 3

1.1. Requirements language

In this document, the key words "MAY", "MUST, "MUST NOT", "optional", "recommended", "SHOULD", and "SHOULD NOT", are to be interpreted as described in [6].

1.2. Terminology

This document frequently uses the following terms: Accounting The collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, and billing. Accounting management requires that resource consumption be measured, rated, assigned, and communicated between appropriate parties. Archival accounting In archival accounting, the goal is to collect all accounting data, to reconstruct missing entries as best as possible in the event of data loss, and to archive data for a mandated time period. It is "usual and customary" for these systems to be engineered to be very robust against accounting data loss. This may include provisions for transport layer as well as application layer acknowledgments, use of non-volatile storage, interim accounting capabilities (stored or transmitted over the wire), etc. Legal or financial requirements frequently mandate archival accounting practices, and may often dictate that data be kept confidential, regardless of whether it is to be used for billing purposes or not. Rating The act of determining the price to be charged for use of a resource. Billing The act of preparing an invoice. Usage sensitive billing A billing process that depends on usage information to prepare an invoice can be said to be usage-sensitive. In contrast, a process that is independent of usage information is said to be non-usage-sensitive. Auditing The act of verifying the correctness of a procedure. In order to be able to conduct an audit it is necessary to be able to definitively determine what procedures were actually carried out so as to be able to compare this to
Top   ToC   RFC2975 - Page 4
             the recommended process.  Accomplishing this may require
             security services such as authentication and integrity
             protection.

   Cost Allocation
             The act of allocating costs between entities.  Note that
             cost allocation and rating are fundamentally different
             processes.  In cost allocation the objective is typically
             to allocate a known cost among several entities.  In rating
             the objective is to determine the amount to be charged for
             use of a resource.  In cost allocation, the cost per unit
             of resource may need to be determined; in rating, this is
             typically a given.

   Interim accounting
             Interim accounting provides a snapshot of usage during a
             user's session.  This may be useful in the event of a
             device reboot or other network problem that prevents the
             reception or generation of a session summary packet or
             session record.  Interim accounting records can always be
             summarized without the loss of information.  Note that
             interim accounting records may be stored internally on the
             device (such as in non-volatile storage) so as to survive a
             reboot and thus may not always be transmitted over the
             wire.

   Session record
             A session record represents a summary of the resource
             consumption of a user over the entire session.  Accounting
             gateways creating the session record may do so by
             processing interim accounting events or accounting events
             from several devices serving the same user.

   Accounting Protocol
             A protocol used to convey data for accounting purposes.

   Intra-domain accounting
             Intra-domain accounting involves the collection of
             information on resource usage within an administrative
             domain, for use within that domain.  In intra-domain
             accounting, accounting packets and session records
             typically do not cross administrative boundaries.

   Inter-domain accounting
             Inter-domain accounting involves the collection of
             information on resource usage within an administrative
Top   ToC   RFC2975 - Page 5
             domain, for use within another administrative domain.  In
             inter-domain accounting, accounting packets and session
             records will typically cross administrative boundaries.

   Real-time accounting
             Real-time accounting involves the processing of information
             on resource usage within a defined time window.  Time
             constraints are typically imposed in order to limit
             financial risk.

   Accounting server
             The accounting server receives accounting data from devices
             and translates it into session records.  The accounting
             server may also take responsibility for the routing of
             session records to interested parties.

1.3. Accounting management architecture

The accounting management architecture involves interactions between network devices, accounting servers, and billing servers. The network device collects resource consumption data in the form of accounting metrics. This information is then transferred to an accounting server. Typically this is accomplished via an accounting protocol, although it is also possible for devices to generate their own session records. The accounting server then processes the accounting data received from the network device. This processing may include summarization of interim accounting information, elimination of duplicate data, or generation of session records. The processed accounting data is then submitted to a billing server, which typically handles rating and invoice generation, but may also carry out auditing, cost allocation, trend analysis or capacity planning functions. Session records may be batched and compressed by the accounting server prior to submission to the billing server in order to reduce the volume of accounting data and the bandwidth required to accomplish the transfer. One of the functions of the accounting server is to distinguish between inter and intra-domain accounting events and to route them appropriately. For session records containing a Network Access Identifier (NAI), described in [8], the distinction can be made by examining the domain portion of the NAI. If the domain portion is absent or corresponds to the local domain, then the session record is treated as an intra-domain accounting event. Otherwise, it is treated as an inter-domain accounting event.
Top   ToC   RFC2975 - Page 6
   Intra-domain accounting events are typically routed to the local
   billing server, while inter-domain accounting events will be routed
   to accounting servers operating within other administrative domains.
   While it is not required that session record formats used in inter
   and intra-domain accounting be the same, this is desirable, since it
   eliminates translations that would otherwise be required.

   Where a proxy forwarder is employed, domain-based access controls may
   be employed by the proxy forwarder, rather than by the devices
   themselves.  The network device will typically speak an accounting
   protocol to the proxy forwarder, which may then either convert the
   accounting packets to session records, or forward the accounting
   packets to another domain.  In either case, domain separation is
   typically achieved by having the proxy forwarder sort the session
   records or accounting messages by destination.

   Where the accounting proxy is not trusted, it may be difficult to
   verify that the proxy is issuing correct session records based on the
   accounting messages it receives, since the original accounting
   messages typically are not forwarded along with the session records.
   Therefore where trust is an issue, the proxy typically forwards the
   accounting packets themselves.  Assuming that the accounting protocol
   supports data object security, this allows the end-points to verify
   that the proxy has not modified the data in transit or snooped on the
   packet contents.
Top   ToC   RFC2975 - Page 7
   The diagram below illustrates the accounting management architecture:

        +------------+
        |            |
        |   Network  |
        |   Device   |
        |            |
        +------------+
              |
   Accounting |
   Protocol   |
              |
              V
        +------------+                               +------------+
        |            |                               |            |
        |   Org B    |  Inter-domain session records |  Org A     |
        |   Acctg.   |<----------------------------->|  Acctg.    |
        |Proxy/Server|   or accounting protocol      |  Server    |
        |            |                               |            |
        +------------+                               +------------+
              |                                            |
              |                                            |
   Transfer   | Intra-domain                               |
   Protocol   | Session records                            |
              |                                            |
              V                                            V
        +------------+                               +------------+
        |            |                               |            |
        |  Org B     |                               |  Org A     |
        |  Billing   |                               |  Billing   |
        |  Server    |                               |  Server    |
        |            |                               |            |
        +------------+                               +------------+

1.4. Accounting management objectives

Accounting Management involves the collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, billing. Each of these tasks has different requirements.

1.4.1. Trend analysis and capacity planning

In trend analysis and capacity planning, the goal is typically a forecast of future usage. Since such forecasts are inherently imperfect, high reliability is typically not required, and moderate packet loss can be tolerated. Where it is possible to use statistical sampling techniques to reduce data collection
Top   ToC   RFC2975 - Page 8
   requirements while still providing the forecast with the desired
   statistical accuracy, it may be possible to tolerate high packet loss
   as long as bias is not introduced.

   The security requirements for trend analysis and capacity planning
   depend on the circumstances of data collection and the sensitivity of
   the data.  Additional security services may be required when data is
   being transferred between administrative domains.  For example, when
   information is being collected and analyzed within the same
   administrative domain, integrity protection and authentication may be
   used in order to guard against collection of invalid data.  In
   inter-domain applications confidentiality may be desirable to guard
   against snooping by third parties.

1.4.2. Billing

When accounting data is used for billing purposes, the requirements depend on whether the billing process is usage-sensitive or not.
1.4.2.1. Non-usage sensitive billing
Since by definition, non-usage-sensitive billing does not require usage information, in theory all accounting data can be lost without affecting the billing process. Of course this would also affect other tasks such as trend analysis or auditing, so that such wholesale data loss would still be unacceptable.
1.4.2.2. Usage-sensitive billing
Since usage-sensitive billing processes depend on usage information, packet loss may translate directly to revenue loss. As a result, the billing process may need to conform to financial reporting and legal requirements, and therefore an archival accounting approach may be needed. Usage-sensitive systems may also require low processing delay. Today credit risk is commonly managed by computerized fraud detection systems that are designed to detect unusual activity. While efficiency concerns might otherwise dictate batched transmission of accounting data, where there is a risk of fraud, financial exposure increases with processing delay. Thus it may be advisable to transmit each event individually to minimize batch size, or even to utilize quality of service techniques to minimize queuing delays. In addition, it may be necessary for authorization to be dependent on ability to pay.
Top   ToC   RFC2975 - Page 9
   Whether these techniques will be useful varies by application since
   the degree of financial exposure is application-dependent.  For
   dial-up Internet access from a local provider, charges are typically
   low and therefore the risk of loss is small.  However, in the case of
   dial-up roaming or voice over IP, time-based charges may be
   substantial and therefore the risk of fraud is larger.  In such
   situations it is highly desirable to quickly detect unusual account
   activity, and it may be desirable for authorization to depend on
   ability to pay.  In situations where valuable resources can be
   reserved, or where charges can be high, very large bills may be rung
   up quickly, and processing may need to be completed within a defined
   time window in order to limit exposure.

   Since in usage-sensitive systems, accounting data translates into
   revenue, the security and reliability requirements are greater.  Due
   to financial and legal requirements such systems need to be able to
   survive an audit.  Thus security services such as authentication,
   integrity and replay protection are frequently required and
   confidentiality and data object integrity may also be desirable.
   Application-layer acknowledgments are also often required so as to
   guard against accounting server failures.

1.4.3. Auditing

With enterprise networking expenditures on the rise, interest in auditing is increasing. Auditing, which is the act of verifying the correctness of a procedure, commonly relies on accounting data. Auditing tasks include verifying the correctness of an invoice submitted by a service provider, or verifying conformance to usage policy, service level agreements, or security guidelines. To permit a credible audit, the auditing data collection process must be at least as reliable as the accounting process being used by the entity that is being audited. Similarly, security policies for the audit should be at least as stringent as those used in preparation of the original invoice. Due to financial and legal requirements, archival accounting practices are frequently required in this application. Where auditing procedures are used to verify conformance to usage or security policies, security services may be desired. This typically will include authentication, integrity and replay protection as well as confidentiality and data object integrity. In order to permit response to security incidents in progress, auditing applications frequently are built to operate with low processing delay.
Top   ToC   RFC2975 - Page 10

1.4.4. Cost allocation

The application of cost allocation and billback methods by enterprise customers is not yet widespread. However, with the convergence of telephony and data communications, there is increasing interest in applying cost allocation and billback procedures to networking costs, as is now commonly practiced with telecommunications costs. Cost allocation models, including traditional costing mechanisms described in [21]-[23] and activity-based costing techniques described in [24] are typically based on detailed analysis of usage data, and as a result they are almost always usage-sensitive. Whether these techniques are applied to allocation of costs between partners in a venture or to allocation of costs between departments in a single firm, cost allocation models often have profound behavioral and financial impacts. As a result, systems developed for this purposes are typically as concerned with reliable data collection and security as are billing applications. Due to financial and legal requirements, archival accounting practices are frequently required in this application.

1.5. Intra-domain and inter-domain accounting

Much of the initial work on accounting management has focused on intra-domain accounting applications. However, with the increasing deployment of services such as dial-up roaming, Internet fax, Voice and Video over IP and QoS, applications requiring inter-domain accounting are becoming increasingly common. Inter-domain accounting differs from intra-domain accounting in several important ways. Intra-domain accounting involves the collection of information on resource consumption within an administrative domain, for use within that domain. In intra-domain accounting, accounting packets and session records typically do not cross administrative boundaries. As a result, intra-domain accounting applications typically experience low packet loss and involve transfer of data between trusted entities. In contrast, inter-domain accounting involves the collection of information on resource consumption within an administrative domain, for use within another administrative domain. In inter-domain accounting, accounting packets and session records will typically cross administrative boundaries. As a result, inter-domain accounting applications may experience substantial packet loss. In addition, the entities involved in the transfers cannot be assumed to trust each other.
Top   ToC   RFC2975 - Page 11
   Since inter-domain accounting applications involve transfers of
   accounting data between domains, additional security measures may be
   desirable.  In addition to authentication, replay and integrity
   protection, it may be desirable to deploy security services such as
   confidentiality and data object integrity.  In inter-domain
   accounting each involved party also typically requires a copy of each
   accounting event for invoice generation and auditing.

1.6. Accounting record production

Typically, a single accounting record is produced per session, or in some cases, a set of interim records which can be summarized in a single record for billing purposes. However, to support deployment of services such as wireless access or complex billing regimes, a more sophisticated approach is required. It is necessary to generate several accounting records from a single session when pricing changes during a session. For instance, the price of a service can be higher during peak hours than off-peak. For a session continuing from one tariff period to another, it becomes necessary for a device to report "packets sent" during both periods. Time is not the only factor requiring this approach. For instance, in mobile access networks the user may roam from one place to another while still being connected in the same session. If roaming causes a change in the tariffs, it is necessary to account for resource consumed in the first and second areas. Another example is where modifications are allowed to an ongoing session. For example, it is possible that a session could be re-authorized with improved QoS. This would require production of accounting records at both QoS levels. These examples could be addressed by using vectors or multi- dimensional arrays to represent resource consumption within a single session record. For example, the vector or array could describe the resource consumption for each combination of factors, e.g. one data item could be the number of packets during peak hour in the area of the home operator. However, such an approach seems complicated and inflexible and as a result, most current systems produce a set of records from one session. A session identifier needs to be present in the records to permit accounting systems to tie the records together. In most cases, the network device will determine when multiple session records are needed, as the local device is aware of factors affecting local tariffs, such as QoS changes and roaming. However, future systems are being designed that enable the home domain to
Top   ToC   RFC2975 - Page 12
   control the generation of accounting records.  This is of importance
   in inter-domain accounting or when network devices do not have tariff
   information.  The centralized control of accounting record production
   can be realized, for instance, by having authorization servers
   require re-authorization at certain times and requiring the
   production of accounting records upon each re-authorization.

   In conclusion, in some cases it is necessary to produce multiple
   accounting records from a single session.  It must be possible to do
   this without requiring the user to start a new session or to re-
   authenticate.  The production of multiple records can be controlled
   either by the network device or by the AAA server.  The requirements
   for timeliness, security and reliability in multiple record sessions
   are the same as for single-record sessions.
Top   ToC   RFC2975 - Page 13

1.7. Requirements summary

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | Usage | Intra-domain | Inter-domain | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Robustness vs. | Robustness vs. | | | packet loss | packet loss | | Capacity | | | | Planning | Integrity, | Integrity, | | | authentication, | authentication, | | | replay protection | replay prot. | | | [confidentiality] | confidentiality | | | | [data object sec.]| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Non-usage | Integrity, | Integrity, | | Sensitive | authentication, | authentication, | | Billing | replay protection | replay protection | | | [confidentiality] | confidentiality | | | | [data object sec.]| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Archival | Archival | | Usage | accounting | accounting | | Sensitive | Integrity, | Integrity, | | Billing, | authentication, | authentication, | | Cost | replay protection | replay prot. | | Allocation & | [confidentiality] | confidentiality | | Auditing | [Bounds on | [data object sec.]| | | processing delay] | [Bounds on | | | | processing delay] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Archival | Archival | | Time | accounting | accounting | | Sensitive | Integrity, | Integrity, | | Billing, | authentication, | authentication, | | fraud | replay protection | replay prot. | | detection, | [confidentiality] | confidentiality | | roaming | | [Data object | | | Bounds on | security and | | | processing delay | receipt support] | | | | Bounds on | | | | processing delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Key [] = optional
Top   ToC   RFC2975 - Page 14

2. Scaling and reliability

With the continuing growth of the Internet, it is important that accounting management systems be scalable and reliable. This section discusses the resources consumed by accounting management systems as well as the scalability and reliability properties exhibited by various data collection and transport models.

2.1. Fault resilience

As noted earlier, in applications such as usage-sensitive billing, cost allocation and auditing, an archival approach to accounting is frequently mandated, due to financial and legal requirements. Since in such situations loss of accounting data can translate to revenue loss, there is incentive to engineer a high degree of fault resilience. Faults which may be encountered include: Packet loss Accounting server failures Network failures Device reboots To date, much of the debate on accounting reliability has focused on resilience against packet loss and the differences between UDP, SCTP and TCP-based transport. However, it should be understood that resilience against packet loss is only one aspect of meeting archival accounting requirements. As noted in [18], "once the cable is cut you don't need more retransmissions, you need a *lot* more voltage." Thus, the choice of transport has no impact on resilience against faults such as network partition, accounting server failures or device reboots. What does provide resilience against these faults is non-volatile storage. The importance of non-volatile storage in design of reliable accounting systems cannot be over-emphasized. Without non-volatile storage, event-driven systems will lose data once the transmission timeout has been exceeded, and batching designs will experience data loss once the internal memory used for accounting data storage has been exceeded. Via use of non-volatile storage, and internally stored interim records, most of these data losses can be avoided. It may even be argued that non-volatile storage is more important to accounting reliability than network connectivity, since for many years reliable accounting systems were implemented based solely on physical storage, without any network connectivity. For example,
Top   ToC   RFC2975 - Page 15
   phone usage data used to be stored on paper, film, or magnetic media
   and carried from the place of collection to a central location for
   bill processing.

2.1.1. Interim accounting

Interim accounting provides protection against loss of session summary data by providing checkpoint information that can be used to reconstruct the session record in the event that the session summary information is lost. This technique may be applied to any data collection model (i.e. event-driven or polling) and is supported in both RADIUS [25] and in TACACS+. While interim accounting can provide resilience against packet loss, server failures, short-duration network failures, or device reboot, its applicability is limited. Transmission of interim accounting data over the wire should not be thought of as a mainstream reliability improvement technique since it increases use of network bandwidth in normal operation, while providing benefits only in the event of a fault. Since most packet loss on the Internet is due to congestion, sending interim accounting data over the wire can make the problem worse by increasing bandwidth usage. Therefore on-the-wire interim accounting is best restricted to high-value accounting data such as information on long-lived sessions. To protect against loss of data on such sessions, the interim reporting interval is typically set several standard deviations larger than the average session duration. This ensures that most sessions will not result in generation of interim accounting events and the additional bandwidth consumed by interim accounting will be limited. However, as the interim accounting interval decreases toward the average session time, the additional bandwidth consumed by interim accounting increases markedly, and as a result, the interval must be set with caution. Where non-volatile storage is unavailable, interim accounting can also result in excessive consumption of memory that could be better allocated to storage of session data. As a result, implementors should be careful to ensure that new interim accounting data overwrites previous data rather than accumulating additional interim records in memory, thereby worsening the buffer exhaustion problem. Given the increasing popularity of non-volatile storage for use in consumer devices such as digital cameras, such devices are rapidly declining in price. This makes it increasingly feasible for network devices to include built-in support for non-volatile storage. This can be accomplished, for example, by support for compact PCMCIA cards.
Top   ToC   RFC2975 - Page 16
   Where non-volatile storage is available, this can be used to store
   interim accounting data.  Stored interim events are then replaced by
   updated interim events or by session data when the session completes.
   The session data can itself be erased once the data has been
   transmitted and acknowledged at the application layer.  This approach
   avoids interim data being transmitted over the wire except in the
   case of a device reboot.  When a device reboots, internally stored
   interim records are transferred to the accounting server.

2.1.2. Multiple record sessions

Generation of multiple accounting records within a session can introduce scalability problems that cannot be controlled using the techniques available in interim accounting. For example, in the case of interim records kept in non-volatile storage, it is possible to overwrite previous interim records with the most recent one or summarize them to a session record. Where interim updates are sent over the wire, it is possible to control bandwidth usage by adjusting the interim accounting interval. These measures are not applicable where multiple session records are produced from a single session, since these records cannot be summarized or overwritten without loss of information. As a result, multiple record production can result in increased consumption of bandwidth and memory. Implementors should be careful to ensure that worst-case multiple record processing requirements do not exceed the capabilities of their systems. As an example, a tariff change at a particular time of day could, if implemented carelessly, create a sudden peak in the consumption of memory and bandwidth as the records need to be stored and/or transported. Rather than attempting to send all of the records at once, it may be desirable to keep them in non-volatile storage and send all of the related records together in a batch when the session completes. It may also be desirable to shape the accounting traffic flow so as to reduce the peak bandwidth consumption. This can be accomplished by introduction of a randomized delay interval. If the home domain can also control the generation of multiple accounting records, the estimation of the worst-case processing requirements can be very difficult.

2.1.3. Packet loss

As packet loss is a fact of life on the Internet, accounting protocols dealing with session data need to be resilient against packet loss. This is particularly important in inter-domain accounting, where packets often pass through Network Access Points
Top   ToC   RFC2975 - Page 17
   (NAPs) where packet loss may be substantial.  Resilience against
   packet loss can be accomplished via implementation of a retry
   mechanism on top of UDP, or use of TCP [7] or SCTP [26].  On-the-wire
   interim accounting provides only limited benefits in mitigating the
   effects of packet loss.

   UDP-based transport is frequently used in accounting applications.
   However, this is not appropriate in all cases.  Where accounting data
   will not fit within a single UDP packet without fragmentation, use of
   TCP or SCTP transport may be preferred to use of multiple round-trips
   in UDP.  As noted in [47] and [49], this may be an issue in the
   retrieval of large tables.

   In addition, in cases where congestion is likely, such as in inter-
   domain accounting, TCP or SCTP congestion control and round-trip time
   estimation will be very useful, optimizing throughput.  In
   applications which require maintenance of session state, such as
   simultaneous usage control, TCP and application-layer keep alive
   packets or SCTP with its built-in heartbeat capabilities provide a
   mechanism for keeping track of session state.

   When implementing UDP retransmission, there are a number of issues to
   keep in mind:

      Data model
      Retry behavior
      Congestion control
      Timeout behavior

   Accounting reliability can be influenced by how the data is modeled.
   For example, it is almost always preferable to use cumulative
   variables rather than expressing accounting data in terms of a change
   from a previous data item.  With cumulative data, the current state
   can be recovered by a successful retrieval, even after many packets
   have been lost.  However, if the data is transmitted as a change then
   the state will not be recovered until the next cumulative update is
   sent.  Thus, such implementations are much more vulnerable to packet
   loss, and should be avoided wherever possible.

   In designing a UDP retry mechanism, it is important that the retry
   timers relate to the round-trip time, so that retransmissions will
   not typically occur within the period in which acknowledgments may be
   expected to arrive.  Accounting bandwidth may be significant in some
   circumstances, so that the added traffic due to unnecessary
   retransmissions may increase congestion levels.
Top   ToC   RFC2975 - Page 18
   Congestion control in accounting data transfer is a somewhat
   controversial issue.  Since accounting traffic is often considered
   mission-critical, it has been argued that congestion control is not a
   requirement; better to let other less-critical traffic back off in
   response to congestion.  Moreover, without non-volatile storage,
   congestive back-off in accounting applications can result in data
   loss due to buffer exhaustion.

   However, it can also be argued that in modern accounting
   implementations, it is possible to implement congestion control while
   improving throughput and maintaining high reliability.  In
   circumstances where there is sustained packet loss, there simply is
   not sufficient capacity to maintain existing transmission rates.
   Thus, aggregate throughput will actually improve if congestive back-
   off is implemented.  This is due to elimination of retransmissions
   and the ability to utilize techniques such as RED to desynchronize
   flows.  In addition, with QoS mechanisms such as differentiated
   services, it is possible to mark accounting packets for preferential
   handling so as to provide for lower packet loss if desired.  Thus
   considerable leeway is available to the network administrator in
   controlling the treatment of accounting packets and hard coding
   inelastic behavior is unnecessary.  Typically, systems implementing
   non-volatile storage allow for backlogged accounting data to be
   placed in non-volatile storage pending transmission, so that buffer
   exhaustion resulting from congestive back-off need not be a concern.

   Since UDP is not really a transport protocol, UDP-based accounting
   protocols such as [4] often do not prescribe timeout behavior.  Thus
   implementations may exhibit widely different behavior.  For example,
   one implementation may drop accounting data after three constant
   duration retries to the same server, while another may implement
   exponential back-off to a given server, then switch to another
   server, up to a total timeout interval of twelve hours, while storing
   the untransmitted data on non-volatile storage.  The practical
   difference between these approaches is substantial; the former
   approach will not satisfy archival accounting requirements while the
   latter may.  More predictable behavior can be achieved via use of
   SCTP or TCP transport.

2.1.4. Accounting server failover

In the event of a failure of the primary accounting server, it is desirable for the device to failover to a secondary server. Providing one or more secondary servers can remove much of the risk of accounting server failure, and as a result use of secondary servers has become commonplace.
Top   ToC   RFC2975 - Page 19
   For protocols based on TCP, it is possible for the device to maintain
   connections to both the primary and secondary accounting servers,
   using the secondary connection after expiration of a timer on the
   primary connection.  Alternatively,  it is possible to open a
   connection to the secondary accounting server after a timeout or loss
   of the primary connection, or on  expiration of a timer.  Thus,
   accounting protocols based on TCP are capable of responding more
   rapidly to connectivity failures than TCP timeouts would otherwise
   allow, at the expense of an increased risk of duplicates.

   With SCTP, it is possible to control transport layer timeout
   behavior, and therefore it is not necessary for the accounting
   application to maintain its own timers.  SCTP also enables
   multiplexing of multiple connections within a single transport
   connection, all maintaining the same congestion control state,
   avoiding the "head of line blocking" issues that can occur with TCP.
   However, since SCTP is not widely available, use of this transport
   can impose an additional implementation burden on the designer.

   For protocols using UDP, transmission to the secondary  server can
   occur after a number of retries or timer expiration.  For
   compatibility with congestion avoidance, it is advisable to
   incorporate techniques such as round-trip-time estimation, slow start
   and congestive back-off.  Thus the accounting protocol designer
   utilizing UDP often is lead to re-inventing techniques already
   existing in TCP and SCTP.  As a result, the use of raw UDP transport
   in accounting applications is not recommended.

   With any transport it is possible for the primary and secondary
   accounting servers to receive duplicate packets, so support for
   duplicate elimination is required.  Since accounting server failures
   can result in data accumulation on accounting clients, use of non-
   volatile storage can ensure against data loss due to transmission
   timeouts or buffer exhaustion.  On-the-wire interim accounting
   provides only limited benefits in mitigating the effects of
   accounting server failures.

2.1.5. Application layer acknowledgments

It is possible for the accounting server to experience partial failures. For example, a failure in the database back end could leave the accounting retrieval process or thread operable while the process or thread responsible for storing the data is non-functional. Similarly, it is possible for the accounting application to run out of disk space, making it unable to continue storing incoming session records.
Top   ToC   RFC2975 - Page 20
   In such cases it is desirable to distinguish between transport layer
   acknowledgment and application layer acknowledgment.  Even though
   both acknowledgments may be sent within the same packet (such as a
   TCP segment carrying an application layer acknowledgment along with a
   piggy-backed ACK), the semantics are different.  A transport-layer
   acknowledgment means "the transport layer has taken responsibility
   for delivering the data to the application", while an application-
   layer acknowledgment means "the application has taken responsibility
   for the data".

   A common misconception is that use of TCP transport guarantees that
   data is delivered to the application.  However, as noted in RFC 793
   [7]:

    An acknowledgment by TCP does not guarantee that the data has been
    delivered to the end user, but only that the receiving TCP has taken
    the responsibility to do so.

   Therefore, if receiving TCP fails after sending the ACK, the
   application may not receive the data.  Similarly, if the application
   fails prior to committing the data to stable storage, the data may be
   lost.  In order for a sending application to be sure that the data it
   sent was received by the receiving application, either a graceful
   close of the TCP connection or an application-layer acknowledgment is
   required. In order to protect against data loss, it is necessary that
   the application-layer acknowledgment imply that the data has been
   written to stable storage or suitably processed so as to guard
   against loss.

   In the case of partial failures, it is possible for the transport
   layer to acknowledge receipt via transport layer acknowledgment,
   without having delivered the data to the application.  Similarly, the
   application may not complete the tasks necessary to take
   responsibility for the data.

   For example, an accounting server may receive data from the transport
   layer but be incapable of storing it data due to a back end database
   problem or disk fault.  In this case it should not send an
   application layer acknowledgment, even though a a transport layer
   acknowledgment is appropriate.  Rather, an application layer error
   message should be sent indicating the source of the problem, such as
   "Backend store unavailable".

   Thus application-layer acknowledgment capability requires not only
   the ability to acknowledge when the application has taken
   responsibility for the data, but also the ability to indicate when
   the application has not taken responsibility for the data, and why.
Top   ToC   RFC2975 - Page 21

2.1.6. Network failures

Network failures may result in partial or complete loss of connectivity for the accounting client. In the event of partial connectivity loss, it may not be possible to reach the primary accounting server, in which case switch over to the secondary accounting server is necessary. In the event of a network partition, it may be necessary to store accounting events in device memory or non-volatile storage until connectivity can be re-established. As with accounting server failures, on-the-wire interim accounting provides only limited benefits in mitigating the effects of network failures.

2.1.7. Device reboots

In the event of a device reboot, it is desirable to minimize the loss of data on sessions in progress. Such losses may be significant even if the devices themselves are very reliable, due to long-lived sessions, which can comprise a significant fraction of total resource consumption. To guard against loss of these high-value sessions, interim accounting data is typically transmitted over the wire. When interim accounting in-place is combined with non-volatile storage it becomes possible to guard against data loss in much shorter sessions. This is possible since interim accounting data need only be stored in non-volatile memory until the session completes, at which time the interim data may be replaced by the session record. As a result, interim accounting data need never be sent over the wire, and it is possible to decrease the interim interval so as to provide a very high degree of protection against data loss.

2.1.8. Accounting proxies

In order to maintain high reliability, it is important that accounting proxies pass through transport and application layer acknowledgments and do not store and forward accounting packets. This enables the end-systems to control re-transmission behavior and utilize techniques such as non-volatile storage and secondary servers to improve resilience. Accounting proxies sending a transport or application layer ACK to the device without receiving one from the accounting server fool the device into thinking that the accounting request had been accepted by the accounting server when this is not the case. As a result, the device can delete the accounting packet from non-volatile storage before it has been accepted by the accounting server. The leaves the
Top   ToC   RFC2975 - Page 22
   accounting proxy responsible for delivering accounting packets.  If
   the accounting proxy involves moving parts (e.g. a disk drive) while
   the devices do not, overall system reliability can be reduced.

   Store and forward accounting proxies only add value in situations
   where the accounting subsystem is unreliable.  For example, where
   devices do not implement non-volatile storage and the accounting
   protocol lacks transport and application layer reliability, locating
   the accounting proxy (with its stable storage) close to the device
   can reduce the risk of data loss.

   However, such systems are inherently unreliable so that they are only
   appropriate for use in capacity planning or non-usage sensitive
   billing applications.  If archival accounting reliability is desired,
   it is necessary to engineer a reliable accounting system from the
   start using the techniques described in this document, rather than
   attempting to patch an inherently unreliable system by adding store
   and forward accounting proxies.
Top   ToC   RFC2975 - Page 23

2.1.9. Fault resilience summary

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Fault | Counter-measures | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Packet | Retransmission based on RTT | | loss | Congestion control | | | Well-defined timeout behavior | | | Duplicate elimination | | | Interim accounting* | | | Non-volatile storage | | | Cumulative variables | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Accounting | Primary-secondary servers | | server & net | Duplicate elimination | | failures | Interim accounting* | | | Application layer ACK & error msgs. | | | Non-volatile storage | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Device | Interim accounting* | | reboots | Non-volatile storage | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Key * = limited usefulness without non-volatile storage Note: Accounting proxies are not a reliability enhancement mechanism.

2.2. Resource consumption

In the process of growing to meet the needs of providers and customers, accounting management systems consume a variety of resources, including: Network bandwidth Memory Non-volatile storage State on the accounting management system CPU on the management system and managed devices
Top   ToC   RFC2975 - Page 24
   In order to understand the limits to scaling, we examine each of
   these resources in turn.

2.2.1. Network bandwidth

Accounting management systems consume network bandwidth in transferring accounting data. The network bandwidth consumed is proportional to the amount of data transferred, as well as required network overhead. Since accounting data for a given event may be 100 octets or less, if each event is transferred individually, overhead can represent a considerable proportion of total bandwidth consumption. As a result, it is often desirable to transfer accounting data in batches, enabling network overhead to be spread over a larger payload, and enabling efficient use of compression. As noted in [48], compression can be enabled in the accounting protocol, or can be done at the IP layer as described in [5].

2.2.2. Memory

In accounting systems without non-volatile storage, accounting data must be stored in volatile memory during the period between when it is generated and when it is transferred. The resulting memory consumption will depend on retry and retransmission algorithms. Since systems designed for high reliability will typically wish to retry for long periods, or may store interim accounting data, the resulting memory consumption can be considerable. As a result, if non-volatile storage is unavailable, it may be desirable to compress accounting data awaiting transmission. As noted earlier, implementors of interim accounting should take care to ensure against excessive memory usage by overwriting older interim accounting data with newer data for the same session rather than accumulating interim data in the buffer.

2.2.3. Non-volatile storage

Since accounting data stored in memory will typically be lost in the event of a device reboot or a timeout, it may be desirable to provide non-volatile storage for undelivered accounting data. With the costs of non-volatile storage declining rapidly, network devices will be increasingly capable of incorporating non-volatile storage support over the next few years. Non-volatile storage may be used to store interim or session records. As with memory utilization, interim accounting overwrite is desirable so as to prevent excessive storage consumption. Note that the use of ASCII data representation enables use of highly efficient text compression algorithms that can minimize storage requirements. Such
Top   ToC   RFC2975 - Page 25
   compression algorithms are only typically applied to session records
   so as to enable implementation of interim data overwrite.

2.2.4. State on the accounting management system

In order to keep track of received accounting data, accounting management systems may need to keep state on managed devices or concurrent sessions. Since the number of devices is typically much smaller than the number of concurrent sessions, it is desirable to keep only per-device state if possible.

2.2.5. CPU requirements

CPU consumption of the managed and managing nodes will be proportional to the complexity of the required accounting processing. Operations such as ASN.1 encoding and decoding, compression/decompression, and encryption/decryption can consume considerable resources, both on accounting clients and servers. The effect of these operations on accounting system reliability should not be under-estimated, particularly in the case of devices with moderate CPU resources. In the event that devices are over- taxed by accounting tasks, it is likely that overall device reliability will suffer.
Top   ToC   RFC2975 - Page 26

2.2.6. Efficiency measures

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Resource | Efficiency measures | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Network | Batching | | Bandwidth | Compression | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Memory | Compression | | | Interim accounting overwrite | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Non-volatile | Compression | | Storage | Interim accounting overwrite | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | System | Per-device state | | state | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | CPU | Hardware assisted | | requirements | compression/encryption | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.3. Data collection models

Several data collection models are currently in use today for the purposes of accounting data collection. These include: Polling model Event-driven model without batching Event-driven model with batching Event-driven polling model
Top   ToC   RFC2975 - Page 27

2.3.1. Polling model

In the polling model, an accounting manager will poll devices for accounting information at regular intervals. In order to ensure against loss of data, the polling interval will need to be shorter than the maximum time that accounting data can be stored on the polled device. For devices without non-volatile stage, this is typically determined by available memory; for devices with non- volatile storage the maximum polling interval is determined by the size of non-volatile storage. The polling model results in an accumulation of data within individual devices, and as a result, data is typically transferred to the accounting manager in a batch, resulting in an efficient transfer process. In terms of Accounting Manager state, polling systems scale with the number of managed devices, and system bandwidth usage scales with the amount of data transferred. Without non-volatile storage, the polling model results in loss of accounting data due to device reboots, but not due to packet loss or network failures of sufficiently short duration to be handled within available memory. This is because the Accounting Manager will continue to poll until the data is received. In situations where operational difficulties are encountered, the volume of accounting data will frequently increase so as to make data loss more likely. However, in this case the polling model will detect the problem since attempts to reach the managed devices will fail. The polling model scales poorly for implementation of shared use or roaming services, including wireless data, Internet telephony, QoS provisioning or Internet access. This is because in order to retrieve accounting data for users within a given domain, the Accounting Management station would need to periodically poll all devices in all domains, most of which would not contain any relevant data. There are also issues with processing delay, since use of a polling interval also implies an average processing delay of half the polling interval. This may be too high for accounting data that requires low processing delay. Thus the event-driven polling or the pure event-driven approach is more appropriate for usage sensitive billing applications such as shared use or roaming implementations. Per-device state is typical of polling-based network management systems, which often also carry out accounting management functions, since network management systems need to keep track of the state of network devices for operational purposes. These systems offer average processing delays equal to half the polling interval.
Top   ToC   RFC2975 - Page 28

2.3.2. Event-driven model without batching

In the event-driven model, a device will contact the accounting server or manager when it is ready to transfer accounting data. Most event-driven accounting systems, such as those based on RADIUS accounting, described in [4], transfer only one accounting event per packet, which is inefficient. Without non-volatile storage, a pure event-driven model typically stores accounting events that have not yet been delivered only until the timeout interval expires. As a result this model has the smallest memory requirements. Once the timeout interval has expired, the accounting event is lost, even if the device has sufficient buffer space to continue to store it. As a result, the event-driven model is the least reliable, since accounting data loss will occur due to device reboots, sustained packet loss, or network failures of duration greater than the timeout interval. In event-driven protocols without a "keep alive" message, accounting servers cannot assume a device failure should no messages arrive for an extended period. Thus, event-driven accounting systems are typically not useful in monitoring of device health. The event-driven model is frequently used in shared use networks and roaming, since this model sends data to the recipient domains without requiring them to poll a large number of devices, most of which have no relevant data. Since the event-driven model typically does not support batching, it permits accounting records to be sent with low processing delay, enabling application of fraud prevention techniques. However, because roaming accounting events are frequently of high value, the poor reliability of this model is an issue. As a result, the event-driven polling model may be more appropriate. Per-session state is typical of event-driven systems without batching. As a result, the event-driven approach scales poorly. However, event-driven systems offer the lowest processing delay since events are processed immediately and there is no possibility of an event requiring low processing delay being caught behind a batch transfer.

2.3.3. Event-driven model with batching

In the event-driven model with batching, a device will contact the accounting server or manager when it is ready to transfer accounting data. The device can contact the server when a batch of a given size has been gathered, when data of a certain type is available or after a minimum time period has elapsed. Such systems can transfer more than one accounting event per packet and are thus more efficient.
Top   ToC   RFC2975 - Page 29
   An event-driven system with batching will store accounting events
   that have not yet been delivered up to the limits of memory.  As a
   result, accounting data loss will occur due to device reboots, but
   not due to packet loss or network failures of sufficiently short
   duration to be handled within available memory.  Note that while
   transfer efficiency will increase with batch size, without non-
   volatile storage, the potential data loss from a device reboot will
   also increase.

   Where event-driven systems with batching have a keep-alive interval
   and run over reliable transport, the accounting server can assume
   that a failure has occurred if no messages are received within the
   keep-alive interval.  Thus, such implementations can be useful in
   monitoring of device health.  When used for this purpose the average
   time delay prior to failure detection is one half the keep-alive
   interval.

   Through implementation of a scheduling algorithm, event-driven
   systems with batching can deliver appropriate service to accounting
   events that require low processing delay.  For example, high-value
   inter-domain accounting events could be sent immediately, thus
   enabling use of fraud-prevention techniques, while all other events
   would be batched.  However, there is a possibility that an event
   requiring low processing delay will be caught behind a batch transfer
   in progress.  Thus the maximum processing delay is proportional to
   the maximum batch size divided by the link speed.

   Event-driven systems with batching scale with the number of active
   devices.  As a result this approach scales better than the pure
   event-driven approach, or even the polling approach, and is
   equivalent in terms of scaling to the event-driven polling approach.
   However, the event-driven batching approach has lower processing
   delay than the event-driven polling approach, since delivery of
   accounting data requires fewer round-trips and events requiring low
   processing delay can be accommodated if a scheduling algorithm is
   employed.

2.3.4. Event-driven polling model

In the event-driven polling model an accounting manager will poll the device for accounting data only when it receives an event. The accounting client can generate an event when a batch of a given size has been gathered, when data of a certain type is available or after a minimum time period has elapsed. Note that while transfer efficiency will increase with batch size, without non-volatile storage, the potential data loss from a device reboot will also increase.
Top   ToC   RFC2975 - Page 30
   Without non-volatile storage, an event-driven polling model will lose
   data due to device reboots, but not due to packet loss, or network
   partitions of short-duration.  Unless a minimum delivery interval is
   set, event-driven polling systems are not useful in monitoring of
   device health.

   The event-driven polling model can be suitable for use in roaming
   since it permits accounting data to be sent to the roaming partners
   with low processing delay.  At the same time non-roaming accounting
   can be handled via more efficient polling techniques, thereby
   providing the best of both worlds.

   Where batching can be implemented, the state required in event-driven
   polling can be reduced to scale with the number of active devices.
   If portions of the network vary widely in usage, then this state may
   actually be less than that of the polling approach.  Note that
   processing delay in this approach is higher than in event-driven
   accounting with batching since at least two round-trips are required
   to deliver data: one for the event notification, and one for the
   resulting poll.
Top   ToC   RFC2975 - Page 31

2.3.5. Data collection summary

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | Model | Pros | Cons | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Polling | Per-device state | Not robust | | | Robust against | against device | | | packet loss | reboot, server | | | Batch transfers | or network | | | | failures* | | | | Polling interval | | | | determined by | | | | storage limit | | | | High processing | | | | delay | | | | Unsuitable for | | | | use in roaming | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Event-driven, | Lowest processing | Not robust | | no batching | delay | against packet | | | Suitable for | loss, device | | | use in roaming | reboot, or | | | | network | | | | failures* | | | | Low efficiency | | | | Per-session state | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Event-driven, | Single round-trip | Not robust | | with batching | latency | against device | | and | Batch transfers | reboot, network | | scheduling | Suitable for | failures* | | | use in roaming | | | | Per active device | | | | state | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Event-driven | Batch transfers | Not robust | | polling | Suitable for | against device | | | use in roaming | reboot, network | | | Per active device | failures* | | | state | Two round-trip | | | | latency | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Key * = addressed by non-volatile storage


(next page on part 2)

Next Section