How could a small Internet service provider (ISP) in Pennsylvania cause millions of websites worldwide to go offline? That’s what happened on June 24, 2019 when users across the World were left unable to access a large fraction of the web. The root cause was an outage suffered by Cloudflare, one of the Internet’s leading content hosts upon which the affected websites relied.
Cloudflare traced the problem to a regional ISP in Pennsylvania that accidentally advertised to the rest of the Internet that the best available routes to Cloudflare were through their small network. This caused a massive volume of global traffic to the ISP, which overwhelmed their limited capacity and so halted Cloudfare’s access to the rest of the Internet. As Cloudflare remarked, it was the Internet equivalent of routing an entire freeway through a neighborhood street.
This incident has highlighted the shocking vulnerability of the Internet. In 2017 alone, there were about 14,000 of these kinds of incidents. Given it is mission-critical for much of the World’s economic and social life, shouldn’t the worldwide web be designed to withstand not only minor hiccups but also major catastrophes? Shouldn’t there be safeguards so that small issues don’t turn into big problems? Governing bodies such as the EU Agency for Network and Information Security (ENISA) have long warned of the risk of such cascading incidents in causing systemic internet failure. Yet the Internet remains quite fragile.
Like a road network, the Internet has its own highways and intersections that consist of cables and routers. The navigation system that manages the flow of data around the network is called the Border Gateway Protocol (BGP). When you visit a website, BGP determines the path through which the site’s data will be transmitted to your device.
The problem is that BGP was designed only to be a temporary fix, a “good enough” solution when the internet was rapidly growing 25 years ago. Back then it was good enough to help the net sustain its explosive expansion and it quickly became part of every backbone router that managed the flow of data down the Internet’s principal pathways. But it wasn’t built with security in mind, and mechanisms to ensure that the paths BGP sent data down were valid has never been added. As a result, routing errors go undetected until they cause congestion and outages.
Even worse, anyone who can access a backbone router (and doing so is trivial for someone with the right knowledge and budget) can construct bogus routes to hijack legitimate data traffic, disrupt services and eavesdrop on communications. This means the modern Internet operates using an insecure protocol that is exploited on a daily basis to compromise communications from governments, financial institutions, weapons manufacturers and cryptocurrencies, often as part of politically-motivated cyber-warfare.
These issues have been known since 1998 when a group of hackers demonstrated to the U.S. Congress how easy it was to compromise Internet communications. Yet little has changed. Deploying the necessary cryptographic solutions turned out to be as hard as changing the engines of an airplane in mid-flight.
In an actual aviation issue, such as the recent issues with Boeing’s 737 MAX aircraft, regulators have the authority to ground an entire fleet until it is fixed. But the Internet has no centralized authority. Different parts of the infrastructure are owned and operated by different entities, including corporations, governments and universities.
Many paths to choose from. Credit: Greg Mahlknecht/Openstreetmap, CC BY-SA
The tussle between these different players means they don’t have incentives to make their own part of the Internet more secure. An organization would have to bear the significant deployment costs and operational risks that come with a switch to a new technology, but it wouldn’t reap any benefits unless a critical mass of other networks did the same.
The most pragmatic solution would be to develop security protocols that don’t need global coordination. But attempts to do this have also been impeded by the decentralized ownership of the Internet. Operators have limited knowledge of what happens beyond their networks because of companies’ desires to keep their business operations secret.
As a result, today nobody has a complete view of our society’s most critical communications infrastructure. This hinders efforts to model the Internet’s behavior under stress, making it harder to design and evaluate trustworthy solutions.