In this section, we discuss a high-level architecture for providing the service described in the previous sections. This discussion is deliberately sketchy, focusing on broad concepts and skipping over details. The intent here is merely to provide an overall architecture, not an implementable specification. A more concrete example of how this might be specified is given in
Section 9.
We start from the premise of the [
RFC 7340] that phone numbers can be associated with credentials that can be used to attest ownership of numbers. For purposes of exposition, we will assume that ownership is associated with the endpoint (e.g., a smartphone), but it might well be associated with a provider or gateway acting for the endpoint instead. It might be the case that multiple entities are able to act for a given number, provided that they have the appropriate authority. [
RFC 8226] describes a credential system suitable for this purpose; the question of how an entity is determined to have control of a given number is out of scope for this document.
An overview of the basic calling and verification process is shown below. In this diagram, we assume that Alice has the number +1.111.555.1111 and Bob has the number +2.222.555.2222.
Alice Call Placement Service Bob
--------------------------------------------------------------------
Store Encrypted PASSporT for 2.222.555.2222 ->
Call from 1.111.555.1111 ------------------------------------------>
<-------------- Request PASSporT(s)
for 2.222.555.2222
Obtain Encrypted PASSporT -------->
(2.222.555.2222, 1.111.555.1111)
[Ring phone with verified callerid
= 1.111.555.1111]
When Alice wishes to make a call to Bob, she contacts the CPS and stores an encrypted PASSporT on the CPS indexed under Bob's number. The CPS then awaits retrievals for that number.
When Alice places the call, Bob's phone would usually ring and display Alice's number (+1.111.555.1111), which is informed by the existing PSTN mechanisms for relaying a calling party number (e.g., the Calling Party's Number (CIN) field of the Initial Address Message (IAM)). Instead, Bob's phone transparently contacts the CPS and requests any current PASSporTs for calls to his number. The CPS responds with any such PASSporTs (or dummy PASSporTs if no relevant ones are currently stored). If such a PASSporT exists, and the verification service in Bob's phone decrypts it using his private key, validates it, then Bob's phone can present the calling party number information as valid. Otherwise, the call is unverifiable. Note that this does not necessarily mean that the call is bogus; because we expect incremental deployment, many legitimate calls will be unverifiable.
The primary attack we seek to prevent is an attacker convincing the callee that a given call is from some other caller C. There are two scenarios to be concerned with:
-
The attacker wishes to impersonate a target when no call from that target is in progress.
-
The attacker wishes to substitute himself for an existing call setup.
If an attacker can inject fake PASSporTs into the CPS or in the communication from the CPS to the callee, he can mount either attack. As PASSporTs should be digitally signed by an appropriate authority for the number and verified by the callee (see
Section 7.1), this should not arise in ordinary operations. Any attacker who is aware of calls in progress can attempt to mount a race to substitute themselves as described in
Section 7.4. For privacy and robustness reasons, using [
RFC 8446] on the originating side when storing the PASSporT at the CPS is
RECOMMENDED.
The entire system depends on the security of the credential infrastructure. If the authentication credentials for a given number are compromised, then an attacker can impersonate calls from that number. However, that is no different from in-band STIR [
RFC 8224].
A secondary attack we must also prevent is denial-of-service against the CPS, which requires some form of rate control solution that will not degrade the privacy properties of the architecture.
All that the receipt of the PASSporT from the CPS proves to the called party is that Alice is trying to call Bob (or at least was as of very recently) -- it does not prove that any particular incoming call is from Alice. Consider the scenario in which we have a service that provides an automatic callback to a user-provided number. In that case, the attacker can try to arrange for a false caller-id value, as shown below:
Attacker Callback Service CPS Bob
--------------------------------------------------------------------
Place call to Bob ---------->
(from 111.555.1111)
Store PASSporT for
CS:Bob ------------->
Call from Attacker (forged CS caller-id info) -------------------->
Call from CS ------------------------> X
<-- Retrieve PASSporT
for CS:Bob
PASSporT for CS:Bob ------------------------>
[Ring phone with callerid =
111.555.1111]
In order to mount this attack, the attacker contacts the Callback Service (CS) and provides it with Bob's number. This causes the CS to initiate a call to Bob. As before, the CS contacts the CPS to insert an appropriate PASSporT and then initiates a call to Bob. Because it is a valid CS injecting the PASSporT, none of the security checks mentioned above help. However, the attacker simultaneously initiates a call to Bob using forged caller-id information corresponding to the CS. If he wins the race with the CS, then Bob's phone will attempt to verify the attacker's call (and succeed since they are indistinguishable), and the CS's call will go to busy/voice mail/call waiting.
In order to prevent a passive attacker from using traffic analysis or similar means to learn precisely when a call is placed, it is essential that the connection between the caller and the CPS be encrypted as recommended above. Authentication services could store dummy PASSporTs at the CPS at random intervals in order to make it more difficult for an eavesdropper to use traffic analysis to determine that a call was about to be placed.
Note that in a SIP environment, the callee might notice that there were multiple INVITEs and thus detect this attack, but in some PSTN interworking scenarios, or highly intermediated networks, only one call setup attempt will reach the target. Also note that the success of this substitution attack depends on the attacker landing their call within the narrow window that the PASSporT is retained in the CPS, so shortening that window will reduce the opportunity for the attack. Finally, smart endpoints could implement some sort of state coordination to ensure that both sides believe the call is in progress, though methods of supporting that are outside the scope of this document.
In order to prevent the flooding of a CPS with bogus PASSporTs, we propose the use of "blind signatures" (see [
RFC 5636]). A sender will initially authenticate to the CPS using its STIR credentials and acquire a signed token from the CPS that will be presented later when storing a PASSporT. The flow looks as follows:
Sender CPS
Authenticate to CPS --------------------->
Blinded(K_temp) ------------------------->
<------------- Sign(K_cps, Blinded(K_temp))
[Disconnect]
Sign(K_cps, K_temp)
Sign(K_temp, E(K_receiver, PASSporT)) --->
At an initial time when no call is yet in progress, a potential client connects to the CPS, authenticates, and sends a blinded version of a freshly generated public key. The CPS returns a signed version of that blinded key. The sender can then unblind the key and get a signature on K_temp from the CPS.
Then later, when a client wants to store a PASSporT, it connects to the CPS anonymously (preferably over a network connection that cannot be correlated with the token acquisition) and sends both the signed K_temp and its own signature over the encrypted PASSporT. The CPS verifies both signatures and, if they verify, stores the encrypted passport (discarding the signatures).
This design lets the CPS rate limit how many PASSporTs a given sender can store just by counting how many times K_temp appears; perhaps CPS policy might reject storage attempts and require acquisition of a new K_temp after storing more than a certain number of PASSporTs indexed under the same destination number in a short interval. This does not, of course, allow the CPS to tell when bogus data is being provisioned by an attacker, simply the rate at which data is being provisioned. Potentially, feedback mechanisms could be developed that would allow the called parties to tell the CPS when they are receiving unusual or bogus PASSporTs.
This architecture also assumes that the CPS will age out PASSporTs. A CPS
SHOULD NOT keep any stored PASSporT for longer than the recommended freshness policy for the "iat" value as described in [
RFC 8224] (i.e., sixty seconds) unless some local policy for a CPS deployment requires a longer or shorter interval. Any reduction in this window makes substitution attacks (see
Section 7.4) harder to mount, but making the window too small might conceivably age PASSporTs out while a heavily redirected call is still alerting.
An alternative potential approach to blind signatures would be the use of verifiable oblivious pseudorandom functions (VOPRFs, per [
PRIVACY-PASS]), which may prove faster.