Introduction

The Border Gateway Protocol (BGP) is the central protocol of the internet, responsible for configuring routing tables of autonomous systems (ASes) across the globe. Unfortunately, it is incredibly vulnerable, and attempts to bolt cryptography onto BGP have been largely unsuccessful. This is partly because of a mismatch between how BGP works and how cryptographic signatures work. In this post we will explore an alternate way of securing BGP using Proof-Carrying Data. This is particularly relevant because:

  1. BGPSec is still not widely deployed and (as we shall see), we can still do something else.
  2. BGPSec requires a lot of signatures and the transition to Post-Quantum digital signatures will only exacerbate the problems with BGPSec: since the signature sizes are at least an order of magnitude larger than current signatures (even compared to RSA), or orders of magnitude slower to verify (as with Isogeny-based signatures).
  3. Proof-carrying data is increasingly practical, with recent practical constructions being able to rely only on the security of standard hash functions (e.g., Keccak or Groestl). Even though it is “moon math” it can be instantiated incredibly conservatively with decent performance.

To explore the problems with BGPSec and the ways that Proof-Carrying Data could possibly help alleviate them, we start with a simplified introduction to routing on the internet.

A Short Primer on Routing

The internet consists of routers connected by links. A link is a physical connection between two routers (think fiber optic cable) and a router is just a device (think computer) that receives packets on one link and forwards them on another. Routing is the process of deciding on which link to forward a packet.

Routing

To help routers make these decisions, every packet on the internet comes with a destination address, and the task of every router is to help get the packet as “close” to its destination as possible: getting there might require multiple hops and the router locally decides which link would get the packet closer to the destination. To do this, routers exploit the hierarchical structure of the IP address space: the more bits that two addresses have in common, the “closer” they are in the network.

This enables routing to work as follows: internally each router maintains a table mapping subnets, groups of IP addresses with a shared prefix, to links. When a packet arrives, the router examines the destination IP address and forwards the packet to the link associated with the most specific subnet (longest prefix) that contains (matches) the destination IP address. For example a router might have the following routing table:

SubnetLink
180.1.0.0/161
180.0.0.0/82
0/03

If it receives a packet with destination IP address 180.1.7.1 it would forward the packet to link 1, because 180.1.0.0 is the most specific subnet that contains 180.1.7.1. If the packet had destination IP address 200.10.61.10 it would forward the packet to link 3, because 0/0 is the most specific subnet in the routing table containing 200.10.61.10.

This process is completely analogous to the postal system: if you are in New York and post a letter to your friend in Berlin, the USPS will first forward the letter to the post office in Germany, then Deutsche Post will forward the letter to an office in Berlin and finally the Berlin branch will deliver the letter to your friend’s house: the address gets more and more specific at every hop, this prevents the USPS from having to know how to deliver a letter to every house in Germany, instead they match the longest prefix they know how to reach, “Germany”, to a link – an airplane heading to Germany. It’s then the job of Deutsche Post to match the rest of the suffix.

The Three Napkin Protocol

Back in the day, people would manually configure these tables on each router. That worked fine when the internet was small: you could simply call up your fellow network administrators and ask them to update their tables if you had a new more-specific subnet. But as the internet grew, it became clear that this manual configuration was not scalable, something had to be done, the computers were going to have to work this thing out amongst themselves…

The answer was to run a distributed path-finding algorithm among the routers. This protocol would become the Border Gateway Protocol (BGP), also referred to as the “three napkin protocol”:

As the prospect of system meltdown loomed, the men began scribbling ideas for a solution onto the back of a ketchup-stained napkin. Then a second. Then a third. The “three-napkins protocol” as its inventors jokingly dubbed it, would soon revolutionize the Internet.

“The Long Life of a Quick Fix”

In principle BGP is beautifully simple: routers exchange messages with their neighbors, telling them which subnets they know how to reach and how many hops away they are. Routing then consists of picking the neighbor (link) claiming that they can reach the most specific subnet with the fewest hops. Technically, the routers are grouped into Autonomous Systems (AS), which roughly speaking, corresponds to network operators. Within an AS, the operator is responsible for routing, it’s their little kingdom – in practice by running BGP within the AS as well.

The Original Three Napkins

In slightly more detail, BGP works as follows: suppose an AS wants to announce a new subnet, effectively saying “I have this subnet and any traffic destined for it should come to me”. To do this it sends a BGP update message to its neighbors stating that the subnet is reachable through it with zero hops (crossing zero links). Each neighbor then forwards a message to their neighbors stating that the subnet is reachable through them with one hop, their neighbors then forward the message to their neighbors with two hops, and so on. Eventually, the update will have reached every router on the internet and they will know which link leads to the shortest path to the subnet.

The Issue

BGP works great if everyone is competent and plays fair.

Problem is, they aren’t and they don’t.

Whether it is Pakistan attempting to block YouTube domestically and accidentally blackholing YouTube across the globe. Or China Telecom doing BGP-hijacking, causing US traffic to be routed through China for surveillance purposes. It is fair to say, people have not been competent or playing fair in BGP land. The situation is pretty dire and the only saving grace is that increasing amounts of internet traffic are encrypted.

Broadly speaking there are two types of attacks on BGP:

  • Announcing subnets you do not own.
  • Lying about the number of hops to a destination.

The natural response to both these attacks is:

  • Add a PKI for assigning ownership of subnets to ASes.
  • Sign routing updates in a way that prevents “stripping” ASes from the middle of the path.

Not surprisingly, that is pretty much what people proposed to do: the first is called RPKI and the second is BGPSec. Signing the BGP updates enables everyone to verify that the update follows a route from the destination by checking a chain of signatures. In a nutshell:

  • When an AS receives a route update, it comes along with a list of the ASes that the update has passed through and a signature from each AS. It then checks the signature of each AS on the path.

  • When propagating an update, the AS signs the entire list of ASes on the path to the destination and the AS that it is forwarding the update to. This is done to prevent an attacker “stripping” ASes from the middle of path: to make the number of hops to the destination appear shorter than it was.

The Problems with BGPSec

The problem with bolting signatures onto BGP is that it fundamentally changes the way that BGP works: previously, an update could contain multiple subnets which could also be merged. Because of the “hierarchical” nature of the IP address space, this dramatically reduces the number of routes that need to be propagated. For instance, if I am an AS and my routing table contains two subnets 180.128.0.0/9 and 180.0.0.0/9 I may advertise that I know how to reach the larger subnet 180.0.0.0/8. But with BGPSec, this is no longer possible: doing so would break the signatures on the BGP announcements I have to forward:

To protect the integrity of these signatures, routers can no longer pack multiple prefixes into a single update, and because the AS number of the receiving AS is in the update, a separate update (with a separate signature) must be generated for each neighbor.

Noction

The result is a large increase in:

  • Communication.
    Because we need to propagate a chain of signatures for every prefix.
  • Computation.
    Because we need to verify signatures for every hop on the route to every subnet.

This makes BGPSec a rather expensive proposition and one of the reasons why it has not been widely deployed:

Deployment of BGPSec will be challenging, as the protocol is quite resource-intensive. BGP updates will be larger due to the inclusion of signatures and supporting information. They will also be more numerous because each update can only carry a single prefix. Processing updates will take more time as each AS hop in the AS path requires a signature check.

Noction

But what if there was a way to “compress” the signatures and subnets into a single update in a verifiable way? What if an AS could simply prove that it operated honestly without passing along the entire chain of signatures? And what if it could simply prove that it verified the “Route Origin Authorization” ownership of the subnet?

This is where proof-carrying data comes in…

Proof-Carrying Data (PCD)

Proof-Carrying Data (PCD), not to be confused with Proof-Carrying Code (PCC), was introduced by Chiesa and Tromer in 2010. As the name suggests, it is data that carries a cryptographic proof along with it attesting that the data has been produced in some particular “allowed” way. This means that there are some set of “allowed transformations” that may have been employed recursively and PCD guarantees (cryptographically) that only those transformations have been applied.

You can consider PCD as a kind of “cryptographic type system”: you have a library of allowed functions that you can apply and PCD ensures that you can only apply the functions in the library to inputs that have been produced by the same set of library functions.

In practice this primitive is constructed using recursive verification of SNARKs or so-called accumulation schemes. The exact techniques are not important for this post, but what is relevant is that these techniques are now practical: taking on the order of a few seconds to produce a proof and a few milliseconds to verify the proof, regardless of how many transformations have been applied. Proof sizes range from 100’s of KB all the way down to a couple of KB. Applying Proof-Carrying Data to BGP is a matter of “moving” some part of the BGP router logic into a circuit (relation) whose execution is then proven using a PCD proof. This forces everyone to “act honestly” – basically they can only run the honest routing code.

PCD to your BGP

Perhaps everything is still a bit abstract, so let’s make it concrete. Below I have created an overly simplified set of PCD relations (“allowed transitions”) for BGP, in this simplified example there are three types of allowed transformations:

  • Create an update from a “Route Origin Authorization” proving ownership of a subnet.
  • Adding a hop to the path of an update.
  • Combining two updates into a single update (merging subnets).

Let’s go through these one by one. I will use Rust-like pseudocode to illustrate the idea. The arguments to each function are previous PCD objects: a function can only be applied to an object if the object is the result of running one of the allowed functions on a previous object.

Transition: Convert Resource Origin Authorization to Update

To get started we must have a function which takes no arguments and returns a routing update: some “allowed transition” that takes nothing and returns an “initial” routing update:

The pseudocode for this transition is as follows:

const ROOTS_OF_TRUST: [PublicKey] = ...;

/// Turns a "Route Origin Authorization" into a path of length 1
fn transition_roa() {
    // verify the Route Origin Authorization against the roots of trust
    verify_roa(ROOTS_OF_TRUST, roa);

    // produces a path of length 1:
    // from "target" the subnets "roa.subnets" are reachable in 1 hop
    return Update {
        target,    // the target of the update (next AS on the path)
        length: 1, // the length of the path
        subnets: roa.subnets
    };
}

This function converts a “Route Origin Authorization” (ROA) – a certificate (using the RPKI) that an AS owns a set of subnets, into a path of length 1 from the target to the owner of the subnet. In slightly more detail:

  • The roa is taken as a “witness”, a local variable that the prover can assign arbitrarily: it does not need to have been produced by any previous “allowed transition.” However, the ROA is then validated against the ROOTS_OF_TRUST for RPKI (a global constant).
  • The target is the AS number of the AS that the update is going to be forwarded to, this is also freely chosen by the prover.
  • Finally it returns an Update object with an arbitrary target and the subnets from the ROA. This is a PCD object, which comes along with a proof that the object has been produced by one of the allowed transitions, in this case transition_roa.

Note that the Update object only contains the next hop in the path and (unlike BGPSec) does not contain the entire path/chain of signatures, therefore the size of the Update object is constant regardless of the number of hops in the path, similarly the verification time is constant. As we shall see, it only grows with the number of disjointed subnets being announced.

Transition: Propagating The Update, Adding a Hop.

The Update object is passed to the target AS, which must then add a hop to the path before announcing it to its neighbors. To this end, we have a function that takes an Update object, creates a new Update object which has the path length one greater and is signed by the target AS of the previous update.

It looks something like this:

/// Adds a hop to the path
fn transition_hop(update) {
    // verify the certificate
    verify_cert(ROOTS_OF_TRUST, cert);
    assert(cert.AS_num == update.target);

    // construct a new update with the path length increased by 1
    let new_update = Update {
        target: new_target // AS number of the next AS
        length: update.length + 1,
        subnets: update.subnets,
    };

    // verify that the target signed the update
    verify_signature(cert.pk, sig, new_update);

    // return the new update
    return new_update;
}

Because the logic of the function is enforced cryptographically, it is only possible to produce a new valid Update object by increasing the length and by proving ownership of the target AS. Like the sequence of signatures in BGPSec, this prevents “stripping” of intermediate ASes from the path: “pretending” to be “closer” requires an update with you as the target and the resulting path length will be one greater.

Transition: Merging Updates

One of the biggest problems with BGPSec is that each update can only carry a single prefix. We can solve this problem by allowing an AS to verifiably merge two updates into a single update.

Note that the function below only merges two updates, but it can be applied recursively to merge any number of updates into a single update.

/// Joins two sets of subnets into a single path announcement
/// Concatenating and merging subnets as needed
fn transition_merge(update1, update2) {
    // both updates must have the same target
    assert(update1.target == update2.target);

    // both updates must have the same length
    assert(update1.length == update2.length);

    // merge the subnets
    return Update {
        length: update1.length,
        subnets: merge_subnets(update1.subnets, update2.subnets),
    };
}

Final Thoughts

The code above is very simplified and ignores issues like certificate expiration, which requires providing a rough timestamp to the PCD circuit as “public input”. Similarly, certificate revocation can be handled by proving non-inclusion in a CRL.

Nonetheless, I believe that PCD is the natural tool for ensuring integrity in distributed systems like BGP. Essentially, BGPSec is a naive implementation of Proof-Carrying Data, in which the entire witness (certificate chain, ROAs, etc.) is passed along with every update and each verifier must re-verify the entire computation. Proof-Carrying Data allows us to “compress” the witness into a single proof and avoid re-verifying the entire chain of signatures: for a path of length n the verification time is O(n) instead of O(n^2).

Furthermore, it turns out that we need to move relatively little of BGP into the PCD circuit: essentially only the verification of digital signatures and the merging of subnets. The actions BGP routers take based on the updates do not need to be incorporated into the circuits.

Combined, I think we could be very close to a practical implementation of BGPSec using PCD, if we are not already there…