The workshop was held across three days with all-group discussion slots, one per day. The following topic areas were identified, and the program committee organized paper submissions into three main themes for each of the three discussion slots. During each discussion, those papers were presented sequentially with open discussion held at the end of each day.
The first day of the workshop focused on the existing state of the relationship between network management and encrypted traffic from various angles. Presentations ranged from discussing classifiers using machine learning to recognize traffic, to advanced techniques for evading traffic analysis, to user privacy considerations.
After an introduction that covered the goals of the workshop and the starting questions (as described in
Section 1), there were four presentations followed by open discussion.
Many existing network management techniques are passive in nature: they don't rely on explicit signals from end hosts to negotiate with network middleboxes but instead rely on inspecting packets to recognize traffic and apply various policies. Traffic classification, as a passive technique, is being challenged by increasing encryption.
Traffic classification is commonly performed by networks to infer what applications and services are being used. This information is in turn used for capacity and resource planning, Quality-of-Service (QoS) monitoring, traffic prioritization, network access control, identity management, and malware detection. However, since classification commonly relies on recognizing unencrypted properties of packets in a flow, increasing encryption of traffic can decrease the effectiveness of classification.
The amount of classification that can be performed on traffic also provides useful insight into how "leaky" the protocols used by applications are and points to areas where information is visible to any observer, who may or may not be malicious.
Frequently, classification has been based on specific rules crafted by experts, but there is also a move toward using machine learning to recognize patterns. "Deep learning" machine-learning models generally rely on analyzing a large set of traffic over time and have trouble reacting quickly to changes in traffic patterns.
Models that are based on closed-world data sets also become less useful over time as traffic changes. [
JIANG] describes experiments that show that a model that performed with high accuracy on an initial data set becomes severely degraded when running on a newer data set that contains traffic from the same applications. Even in as little time as one week, the traffic classification would become degraded. However, the set of features in packets and flows that were useful for models stayed mostly consistent, even if the models themselves needed to be updated. Models where the feature space is reduced to fewer features showed better resiliency and could be retrained more quickly. Based on this, [
JIANG] recommends more work and research to determine which set of features in IP packets are most useful for focused machine-learning analysis. [
WU] also recommends further research investment in Artificial Intelligence (AI) analysis for network management.
Just as traffic classification is continually adapting, techniques to prevent traffic analysis and to obfuscate application and user traffic are continually evolving. An invited talk from the authors of [
DITTO] shared a novel approach with the workshop for how to build a very robust system to prevent unwanted traffic analysis.
Usually traffic obfuscation is performed by changing the timing of packets or by adding padding to data. The practices can be costly and negatively impact performance. [
DITTO] demonstrated the feasibility of applying traffic obfuscation on aggregated traffic in the network with minimal overhead and inline speed.
While traffic obfuscation techniques are not widely deployed today, this study underlines the need for continuous effort to keep traffic models updated over time, the challenges of the classification of encrypted traffic, as well as the opportunities to further enhance user privacy.
The Privacy Enhancements and Assessments Research Group (PEARG) is working on a document to discuss guidelines for measuring traffic on the Internet in a safe and privacy-friendly way [
LEARMONTH]. These guidelines and principles provide another view on the discussion of passive classification and analysis of traffic.
Consent for collection and measurement of metadata is an important consideration in deploying network measurement techniques. This consent can be given explicitly as informed consent, given by proxy, or may be only implied. For example, a user of a network might need to consent to certain measurement and traffic treatment when joining a network.
Various techniques for data collection can also improve user privacy, such as discarding data after a short period of time, masking aspects of data that contain user-identifying information, reducing the accuracy of collected data, and aggregating data.
The intents and goals of users, application developers, and network operators align in some cases, but not in others. One of the recurring challenges that was discussed was the lack of a clear way to understand or to communicate intents and requirements. Both traffic classification and traffic obfuscation attempt to change the visibility of traffic without cooperation of other parties: traffic classification is an attempt by the network to inspect application traffic without coordination from applications, and traffic obfuscation is an attempt by the application to hide that same traffic as it transits a network.
Traffic adaptation and prioritization is one dimension in which the incentives for cooperation seem most clear. Even if an application is trying to prevent the leaking of metadata, it could benefit from signals from the network about sudden capacity changes that can help it adapt its application quality, such as bitrates and codecs. Such signaling may not be appropriate for the most privacy-sensitive applications, like Tor, but could be applicable for many others. There are existing protocols that involve explicit signaling between applications and networks, such as Explicit Congestion Notification (ECN) [
RFC 3168], but that has yet to see wide adoption.
Managed networks (such as private corporate networks) were brought up in several comments as particularly challenging for meeting management requirements while maintaining encryption and privacy. These networks can have legal and regulated requirements for detection of specific fraudulent or malicious traffic.
Personal networks that enable managed parental controls have similar complications with encrypted traffic and user privacy. In these scenarios, the parental controls that are operated by the network may be as simple as a DNS filter, which can be made ineffective by a device routing traffic to an alternate DNS resolver.
The second day of the workshop focused on the emerging techniques for analyzing, managing, or monitoring encrypted traffic. Presentations covered advanced classification and identification, including machine-learning techniques, for the purposes of managing network flows or monitoring or monetizing usage.
After an introduction that covered the goals of the workshop and the starting questions (as described in
Section 1), there were three presentations, followed by open discussion.
It is the intent of end-to-end encryption of traffic to create a barrier between entities inside the communication channel and everyone else, including network operators. Therefore, any attempt to overcome that intentional barrier requires collaboration between the inside and outside entities. At a minimum, those entities must agree on the benefits of overcoming the barrier (or solving the problem), agree that costs are proportional to the benefits, and agree to additional limitations or safeguards against bad behavior by collaborators including other non-insiders [
BARNES].
The Internet is designed interoperably, which means an outside entity wishing to collaborate with the inside might be any number of intermediaries and not, say, a specific person that can be trusted in the human sense. Additionally, the use of encryption, especially network-layer or transport-layer encryption, introduces dynamic or opportunistic or perfunctory discoverability. These realities point to a need to ask why an outside entity might make an engineering case to collaborate with the user of a network with encrypted traffic and to ask whether the trade-offs and potential risks are worth it to the user.
However, the answers cannot be specific, and the determinations or guidance need to be general as the encryption boundary is inevitably an application used by many people. Trade-offs must make sense to users who are unlikely to be thinking about network management considerations. Harms need to be preemptively reduced because, in general terms, few users would choose network management benefits over their own privacy if given the choice.
Some have found that there appears to be little, if any, evidence that encryption causes network problems that are meaningful to the user. Since alignment on problem solving is a prerequisite to collaboration on a solution, it does not seem that collaboration across the encryption boundary is called for.
Even with the wide-scale deployment of encryption in new protocols and of techniques that prevent passive observers of network traffic from knowing the content of exchanged communications, important information, such as which parties communicate and sometimes even which services have been requested, may still be able to be deduced. The future is to conceal more data and metadata from passive observers and also to minimize information exposure to second parties (where the user is the first party) by, maybe counterintuitively, introducing third-party relay services to intermediate communications. As discussed in [
KUEHLEWIND], the relay is a mechanism that uses additional levels of encryption to separate two important pieces of information: knowledge of the identity of the person accessing a service is separated from knowledge about the service being accessed. By contrast, a VPN uses only one level of encryption and does not separate identity (first party) and service (second party) metadata.
Relay mechanisms are termed "oblivious", there is a future for specifications in privacy-preserving measurement (PPM), and protocols like Multiplexed Application Substrate over QUIC Encryption (MASQUE) are discussed in the IETF. In various schemes, users are ideally able to share their identity only with the entity they have identified as a trusted one. That data is not shared with the service provider. However, this is more complicated for network management, but there may be opportunities for better collaboration between the network and, say, the application or service at the endpoint.
A queriable relay mechanism could preserve network management functions that are disrupted by encryption, such as TCP optimization, quality of service, zero-rating, parental controls, access control, redirection, content enhancement, analytics, and fraud prevention. Instead of encrypting communication between only two ends with passive observation by all on-path elements, intermediate relays could be introduced as trusted parties that get to see limited information for the purpose of collaboration between in-network intermediary services.
Out of all of the possible network management functions that might be ameliorated by proxying, the ability to control congestion in encrypted communications has been researched in depth. These techniques are realized based on TCP performance-enhancing proxies (PEPs) that either entirely intercept a TCP connection or interfere with the transport information in the TCP header. However, despite the challenge that the new encrypted protocol will limit any such in-network interference, these techniques can also have a negative impact on the evolvability of these protocols. Therefore, a new approach was presented where, instead of manipulating existing information, additional information is sent using a so-called sidecar protocol independent of the main transport protocol that is used end to end [
WELZL]. For example, sidecar information can contain additional acknowledgments to enable in-network local retransmission or faster end-to-end retransmission by reducing the signaling round-trip time.
Taking user privacy benefits for granted, there is a need to investigate the comparable performance outputs of various encrypted traffic configurations such as the use of an additional sidecar protocol, or explicit encrypted and trusted network communication using MASQUE in relation to existing techniques such as TCP PEPs, etc.
One size fits all? On the issue of trust, different networks or devices will have different trust requirements for devices, users, or each other, and vice versa. For example, imagine two networks with really different security requirements, like a home network with a requirement to protect its child users versus a national security institution's network. How could one network architecture solve the needs of all use cases?
Does our destination have consequences? It seems sometimes that there may be future consequences caused by the ubiquitous, strong encryption of network traffic because it will cause intermediaries to poke holes in what are supposed to be long-term solutions for user privacy and security.
Can we bring the user along? While there has been a focus on the good reasons why people might collaborate across the encryption barrier, there will always be others who want to disrupt that in order to exploit the data for their own gain, and sometimes exploitation is called innovation. High-level policy mitigations have exposed how powerless end users are against corporate practices of data harvesting. And yet interfaces to help users understand these lower-layer traffic flows to protect their financial transactions or privacy haven't been achieved yet. That means that engineers must make inferences about user wants. Instead, we should make these relationships and trade-offs more visible.
The third day focused on techniques that could be used to improve the management of encrypted networks.
The potential paths forward described in the presentations included some element of collaboration between the networks and the subscribing clients that simultaneously want both privacy and protection. Thus, the central theme of the third day became negotiation and collaboration.
For enterprise networks where client behavior is potentially managed, [
COLLINS] proposes "Improving network monitoring through contracts", where contracts describe different states of network behavior.
Because network operators have a limited amount of time to focus on problems and process alerts, contracts and states let the operator focus on a particular aspect of a current situation or problem. The current estimate for the number of events a Security Operations Center (SOC) operator can handle is about 10 per hour. Operators must work within the limits imposed by their organization and must pick among options that frequently only frustrate attackers -- preventing attacks entirely is potentially impossible. Finally, operators must prioritize and manage the most events possible.
Validating which alerts are true positives is challenging because lots of weird traffic creates many anomalies, and not all anomalies are malicious events. Identifying which anomalous traffic is rooted in malicious activity with any level of certainty is extremely challenging. Unfortunately, applying the latest machine-learning techniques has produced mixed results. To make matters worse, the large amounts of Internet-wide scanning has resulted in endless traffic that is technically malicious but only creates an information overload and challenges event prioritization. Any path forward must free up analyst time to concentrate on the more challenging events.
The proposed contract solution is to define a collection of acceptable behaviors that comprises different states that might include IP addresses, domain names, and indicators of compromise. Deviation from a contract might indicate that a system is acting outside a normal mode of behavior or even that a normal mode of behavior is suddenly missing. An example contract might be "this system is expected to update its base OS once a day". If this doesn't occur, then this expectation has not been met, and the system should be checked as it failed to call home to look for (potentially security-related) updates.
Within the IETF, the Manufacturer Usage Description Specification (MUD) [
RFC 8520] is one subset of contracts. Note that contracts are likely to succeed only in a constrained, expected environment maintained by operational staff and may not work in an open Internet environment where end users drive all network connections.
The world is not only shifting to increased encrypted traffic but is also encrypting more and more of the metadata (e.g., DNS queries and responses). This makes network policy enforcement by middleboxes significantly more challenging. The result is a significant tension between security enforcement and privacy protection.
Goals for solving this problem should include enabling networks to enforce their policies, but should not include the weakening of encryption nor the deployment of new server software. Existing solutions fail to meet at least one of these points.
A cryptographic principle of a "zero-knowledge proof" (ZKP) [
GRUBBS] may be one path forward to consider. A ZKP allows a third party to verify that a statement is true without revealing what the statement actually is. Applying this to network traffic has been shown to allow a middlebox to verify that traffic to a web server is compliant with a policy without revealing the actual contents. This solution meets the three criteria listed above. Using ZKP within TLS 1.3 traffic turns out to be plausible.
An example engine using encrypted DNS was built to test ZKP. Clients were able to create DNS requests that were not listed within a DNS block list. Middleboxes could verify, without knowing the exact request, that the client's DNS request was not on the prohibited list. Although the result was functional, the computational overhead was still too slow, and future work will be needed to decrease the ZKP-imposed latencies.
The principle challenge being studied is how to handle the inherent conflict between filtering and privacy. Network operators need to implement policies and regulations that can originate from many locations (e.g., security, governmental, parental, etc.). Conversely, clients need to protect their users' privacy and security.
Safe browsing, originally created by Google, is one example of a mechanism that tries to meet both sides of this conflict. It would be beneficial to standardize this and other similar mechanisms. Operating systems could continually protect their users by ensuring that malicious destinations are not being reached. This would require some coordination between cooperating clients and servers offering protection services. These collaborative solutions may be the best compromise to resolve the tension between privacy services and protection services [
PAULY].