In this non-normative appendix, we provide an overview of some existing techniques and standard proposals for each network telemetry module.
[
RFC 6241] is a popular network management protocol recommended by IETF. Its core strength is for managing configuration, but it can also be used for data collection. [
RFC 8639] [
RFC 8641] extends NETCONF and enables subscriber applications to request a continuous, customized stream of updates from a YANG datastore. Providing such visibility into changes made upon YANG configuration and operational objects enables new capabilities based on the remote mirroring of configuration and operational state. Moreover, a [
NETCONF-DISTRIB-NOTIF] via a [
NETCONF-UDP-NOTIF] provides enhanced efficiency for the NETCONF-based telemetry.
[
gnmi] is a network management protocol based on the [
grpc] Remote Procedure Call (RPC) framework. With a single gRPC service definition, both configuration and telemetry can be covered. gRPC is an open-source micro-service communication framework based on [
RFC 7540]. It provides a number of capabilities that are well-suited for network telemetry, including:
-
A full-duplex streaming transport model; when combined with a binary encoding mechanism, it provides good telemetry efficiency.
-
A higher-level feature consistency across platforms that common HTTP/2 libraries typically do not provide. This characteristic is especially valuable for the fact that telemetry data collectors normally reside on a large variety of platforms.
-
A built-in load-balancing and failover mechanism.
[
RFC 7854] is used to monitor BGP sessions and is intended to provide a convenient interface for obtaining route views.
BGP routing information is collected from the monitored device(s) to the BMP monitoring station by setting up the BMP TCP session. The BGP peers are monitored by the BMP Peer Up and Peer Down notifications. The BGP routes (including [
RFC 7854], [
RFC 8671], and [
RFC 9069]) are encapsulated in the BMP Route Monitoring Message and the BMP Route Mirroring Message, providing both an initial table dump and real-time route updates. In addition, BGP statistics are reported through the BMP Stats Report Message, which could be either timer triggered or event-driven. Future BMP extensions could further enrich BGP monitoring applications.
The Alternate-Marking method enables efficient measurements of packet loss, delay, and jitter both in IP and Overlay Networks, as presented in [
RFC 8321] and [
RFC 8889].
This technique can be applied to point-to-point and multipoint-to-multipoint flows. Alternate Marking creates batches of packets by alternating the value of 1 bit (or a label) of the packet header. These batches of packets are unambiguously recognized over the network, and the comparison of packet counters for each batch allows the packet loss calculation. The same idea can be applied to delay measurement by selecting ad hoc packets with a marking bit dedicated for delay measurements.
The Alternate-Marking method needs two counters each marking period for each flow under monitor. For instance, by considering n measurement points and m monitored flows, the order of magnitude of the packet counters for each time interval is n*m*2 (1 per color).
Since networks offer rich sets of network performance measurement data (e.g., packet counters), conventional approaches run into limitations. The bottleneck is the generation and export of the data and the amount of data that can be reasonably collected from the network. In addition, management tasks related to determining and configuring which data to generate lead to significant deployment challenges.
The Multipoint Alternate-Marking approach, described in [
RFC 8889], aims to resolve this issue and make the performance monitoring more flexible in case a detailed analysis is not needed.
An application orchestrates network performance measurement tasks across the network to allow for optimized monitoring. The application can choose how roughly or precisely to configure measurement points depending on the application's requirements.
Using Alternate Marking, it is possible to monitor a Multipoint Network without in-depth examination by using Network Clustering (subnetworks that are portions of the entire network that preserve the same property of the entire network, called clusters). So in the case where there is packet loss or the delay is too high, the specific filtering criteria could be applied to gather a more detailed analysis by using a different combination of clusters up to a per-flow measurement as described in the Alternate-Marking document [
RFC 8321].
In summary, an application can configure end-to-end network monitoring. If the network does not experience issues, this approximate monitoring is good enough and is very cheap in terms of network resources. However, in case of problems, the application becomes aware of the issues from this approximate monitoring and, in order to localize the portion of the network that has issues, configures the measurement points more extensively, allowing more detailed monitoring to be performed. After the detection and resolution of the problem, the initial approximate monitoring can be used again.
A hardware-based [
OPSAWG-DNP4IQ] provides a programmable means to customize the data that an application collects from the data plane. A direct benefit of DNP is the reduction of the exported data. A full DNP solution covers several components including data source, data subscription, and data generation. The data subscription needs to define the derived data that can be composed and derived from raw data sources. The data generation takes advantage of the moderate in-network computing to produce the desired data.
While DNP can introduce unforeseeable flexibility to the data plane telemetry, it also faces some challenges. It requires a flexible data plane that can be dynamically reprogrammed at runtime. The programming Application Programming Interface (API) is yet to be defined.
Traffic on a network can be seen as a set of flows passing through network elements. [
RFC 7011] provides a means of transmitting traffic flow information for administrative or other purposes. A typical IPFIX-enabled system includes a pool of Metering Processes that collects data packets at one or more Observation Points, optionally filters them, and aggregates information about these packets. An Exporter then gathers each of the Observation Points together into an Observation Domain and sends this information via the IPFIX protocol to a Collector.
Classical passive and active monitoring and measurement techniques are either inaccurate or resource consuming. It is preferable to directly acquire data associated with a flow's packets when the packets pass through a network. [
RFC 9197], a data generation technique, embeds a new instruction header to user packets, and the instruction directs the network nodes to add the requested data to the packets. Thus, at the path's end, the packet's experience gained on the entire forwarding path can be collected. Such firsthand data is invaluable to many network OAM applications.
However, IOAM also faces some challenges. The issues on performance impact, security, scalability and overhead limits, encapsulation difficulties in some protocols, and cross-domain deployment need to be addressed.
The postcard-based telemetry, as embodied in [
IPPM-IOAM-DIRECT-EXPORT] and [
IPPM-POSTCARD-BASED-TELEMETRY], is a complementary technique to the passport-based IOAM [
RFC 9197]. PBT directly exports data at each node through an independent packet. At the cost of higher bandwidth overhead and the need for data correlation, PBT shows several unique advantages. It can also help to identify packet drop location in case a packet is dropped on its forwarding path.
Various data planes raise unique OAM requirements. IETF has published OAM technique and framework documents (e.g., [
RFC 8924] and [
RFC 5085]) targeting different data planes such as Multiprotocol Label Switching (MPLS), L2 Virtual Private Network (VPN), Network Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN), Bit Index Explicit Replication (BIER), Service Function Chaining (SFC), Segment Routing (SR), and Deterministic Networking (DETNET). The aforementioned data plane telemetry techniques can be used to enhance the OAM capability on such data planes.
To ensure that the information provided by external event detectors and used by the network management solutions is meaningful for management purposes, the network telemetry framework must ensure that such detectors (sources) are easily connected to the management solutions (sinks). This requires the specification of a list of potential external data sources that could be of interest in network management and matching it to the connectors and/or interfaces required to connect them.
Categories of external event sources that may be of interest to network management include:
-
Smart objects and sensors. With the consolidation of the Internet of Things (IoT), any network system will have many smart objects attached to its physical surroundings and logical operation environments. Most of these objects will be essentially based on sensors of many kinds (e.g., temperature, humidity, and presence), and the information they provide can be very useful for the management of the network, even when they are not specifically deployed for such purpose. Elements of this source type will usually provide a specific protocol for interaction, especially one of the protocols related to IoT, such as the Constrained Application Protocol (CoAP).
-
Online news reporters. Several online news services have the ability to provide an enormous quantity of information about different events occurring in the world. Some of those events can have an impact on the network system managed by a specific framework; therefore, such information may be of interest to the management solution. For instance, diverse security reports, such as Common Vulnerabilities and Exposures (CVEs), can be issued by the corresponding authority and used by the management solution to update the managed system, if needed. Instead of a specific protocol and data format, the sources of this kind of information usually follow a relaxed but structured format. This format will be part of both the ontology and information model of the telemetry framework.
-
Global event analyzers. The advance of big data analyzers provides a huge amount of information and, more interestingly, the identification of events detected by analyzing many data streams from different origins. In contrast with the other types of sources, which are focused on specific events, the detectors of this source type will detect generic events. For example, during a sports event, some unexpected movement makes it fascinating, and many people connect to sites that are reporting on the event. The underlying networks supporting the services that cover the event can be affected by such situation, so their management solutions should be aware of it. In contrast with the other source types, a new information model, format, and reporting protocol is required to integrate the detectors of this type with the management solution.
Additional detector types can be added to the system, but generally they will be the result of composing the properties offered by these main classes.
For allowing external event detectors to be properly integrated with other management solutions, both elements must expose interfaces and protocols that are subject to their particular objective. Since external event detectors will be focused on providing their information to their main consumers, which generally will not be limited to the network management solutions, the framework must include the definition of the required connectors for ensuring the interconnection between detectors (sources) and their consumers within the management systems (sinks) are effective.
In some situations, the interconnection between external event detectors and the management system is via the management plane. For those situations, there will be a special connector that provides the typical interfaces found in most other elements connected to the management plane. For instance, the interfaces could accomplish this with a specific data model (YANG) and specific telemetry protocol, such as NETCONF, YANG-Push, or gRPC.