Category: Resiliency

Introducing a New Deploy360 Topic: Routing Resiliency / Routing Security

© istock photo / Andrey Prokhorov

How reliable and secure is the Internet’s underlying routing infrastructure? How well does it hold up in the face of a major event such as the recent Hurricane Sandy that hit the US? How well can it withstand attacks and misconfiguration errors?  As we continue to move more and more of our communication into the “cloud” of the Internet, how secure and reliable is the underlying routing fabric that holds it all together?

Over the past year here at Deploy360, we have been talking a great deal about how we need to get IPv6 deployed to enable more connections to the Internet… more networks, more devices, more “Internet of Things” and more people as there are still 5 billion people yet to get online.  We’ve also been talking about how we need to get DNSSEC more widely deployed to create a more secure Internet and to enable a whole new realm of innovations such as the DANE protocol that can create a stronger security layer.

But it’s become increasingly clear to us that as we get more people connected to the Internet and even as we add security layers like DNSSEC, there is another area where we need to greatly increase the conversation.

The truth is… the Internet today IS highly reliable, even in the cases of events like Hurricane Sandy. The Internet, as we like to say, “routes around damage.”  Even in the face of malicious attacks to sections of the Internet, the overall network has continued to function.

But…

… as the Internet continues to evolve and the number of network operators expands… as we bring the next billion people online… as we interconnect even more devices and things… we need to ensure that the Internet’s underlying routing infrastructure is both reliable and secure.  There is room today for improvement.

A New Topic: Routing

And so we are launching a new area on our site that we are calling simply “Routing“, where we will focus on providing real-world deployment information to the global operator community related to “routing security” and “routing resiliency.”

The term “resiliency” is an important one, and a common definition for a network is:

the ability of the network to provide and maintain an acceptable level of service in the face of various faults and challenges to normal operation.

Ultimately that is our goal – doing what we can to work with the operator community to ensure the resilience of the Internet’s routing infrastructure.  A part of that is “routing security,” but the topic is really much larger and dives into operational practices, policies and other areas.

As we have with IPv6 and DNSSEC (and will be continuing to do as we build out our roadmaps for those topics), we’ll start with a foundation of information including:

  • Reports and studies on best current operational practices (BCOPs) for routing resiliency and security
  • Case studies of how BCOPs are deployed and effectively used – as well as case studies of recent routing incidents
  • Tools that can be used to help better understand how resilient and secure your routing infrastructure is
  • Sites with statistics and data to help you understand the overall situation

We’ll focus on finding or creating the best tutorials, whitepapers, reports, videos, statistics, sites and tools, just as we’ve done with IPv6 and DNSSEC. As in the other topics, we’ll be looking to promote resources created by many of you who are reading this message.  And where we can’t find resources others have created, we’ll go ahead and create them either ourselves or through partners. We’ll also naturally be adding in routing-related posts to our constant stream of more news-related blog posts.

Note that this “routing resiliency/security” topic will be a bit different than our other areas in that we are not focusing on a specific protocol but rather on a broader topic.

Certainly over the next few months after we’ve built the foundation we will explore some of the protocols that are being discussed now within the IETF such as Secure BGP (BGPSEC) and the Resource Public Key Infrastructure (RPKI) – but they will again be discussed within this broader context of how they are part of the puzzle – “building blocks,” really – of making the Internet more resilient and secure.  We’ll also be integrating and promoting some of the routing security work we’ve been doing for some time now, such as the routing security “operator roundtables” we’ve held.

It’s an ambitious topic … and more than one person has said to us something like “Wow! Making DNSSEC and IPv6 interesting was hard enough… now you are going to dive down into BGP and the guts of routing? Are you crazy?” And yes, we’re aware that the community of people who even know about all this stuff is tiny, let alone those who reallyunderstand it.

But that’s what we want to change!  We want more people to understand how the Internet really works down underneath, so that they, too, can understand what we need to do to ensure it continues to be the vibrant Internet we’ve come to expect.

It’s important, too, for the future of the open Internet… and for the billions of people and devices yet to connect.  As a report from ENISA so nicely puts it:

There may well not be an immediate cause for concern about the resilience of the Internet interconnection ecosystem, but there is cause for concern about the lack of good information about how it works and how well it might work if something went very badly wrong.

We aim to help change that!

How You Can Help

Want to join us in this quest to improve routing resiliency and security?  While we’re starting to add resources and pages to the site, there are a couple of ways you can help us out:

1. Read the reports we’ve listed. You may want to start with the excellent report, “Inter-X: Resilience of the Internet Interconnection Ecosystem,” that summarizes the situation and offers suggestions for how to move forward.  The 31-page summary document is enough to get started … although the truly hard-core may enjoy the 239-page “full” report. From there you can move on to the other documents for a deeper understanding.

2. Send us suggestions – if you know of a report, whitepaper, tutorial, video, case study, site or other resource we should consider adding to the site, please let us know. We have a list of many resources that we are considering, but we are always looking for more.

3. Volunteer – if you are very interested in this topic and would like to actively help us on an ongoing basis, please fill out our volunteer form and we’ll get you plugged in when we get the volunteer effort going in the next few months.

4. Help us spread the word – As we publish resources and blog posts relating to routing resiliency / security, please help us spread those links through social networks so that more people can learn about the topic.

With your help, we can build out this Routing area of Deploy360 to be an outstanding resource for the Internet community and to help make the Internet more resilient and secure!

 

ENISA Report: Resilience of the Internet Interconnection Ecosystem

Seeking to understand routing resiliency and routing security? In this April 2011 report, “Inter-X: Resilience of the Internet Interconnection Ecosystem
“, the European Network and Information Security Agency (ENISA) provides an extremely thorough understanding of the complex ecosystem of connections between networks.

This document is highly recommended to anyone looking to understand how the Internet operates – and where there are opportunities for improvement.

As noted on the introductory web page, the study:

…looks at the resilience of the Internet interconnection ecosystem. The Internet is a network of networks, and the interconnection ecosystem is the collection of layered systems that holds it together. The interconnection ecosystem is the core of the Internet, providing the basic function of reaching anywhere from everywhere.

where “resilience” is defined as:

the ability to provide and maintain an acceptable level of service in the face of various faults and challenges to normal operation.

The comprehensive study outlines the challenges to both measuring the infrastructure of the Internet and to understanding the resilience of the network.  A key point is:

There may well not be an immediate cause for concern about the resilience of the Internet interconnection ecosystem, but there is cause for concern about the lack of good information about how it works and how well it might work if something went very badly wrong.

The report sets out to capture a good bit of that information and to lay out recommendations about how further work may be undertaken.  The document is available in two versions:

  • a 31-page “Executive Summary” report (PDF) that presents the major findings and recommendations and provides a decent tutorial into the issues and challenges.
  • a 239-page “Full” report (PDF) that goes into great detail about the “state of the art” with regard to routing and Internet interconnections, includes a section about how the report was developed and then includes a lengthy bibliography that is very useful in and of itself.

While originating in Europe, the document and its recommendations are globally applicable.

For a taste of the document, here is the table of contents of the Executive Summary report:

1 Summary

  • 1.1 Scale and Complexity
  • 1.2 The Nature of Resilience
  • 1.3 The Lack of Information
  • 1.4 Resilience and Efficiency
  • 1.5 Resilience and Equipment
  • 1.6 Service Level Agreements (SLAs) and ‘Best Efforts’
  • 1.7 Reachability, Traffic and Performance
  • 1.8 Is Transit a Viable Business?
  • 1.9 The Rise of the Content Delivery Networks
  • 1.10 The “Insecurity” of BGP
  • 1.11 Cyber Exercises on Interconnection Resilience
  • 1.12 The “Tragedy of the Commons”
  • 1.13 Regulation

2 Recommendations

  • Incident Investigation
  • Data Collection of Network Performance Measurements
  • Research into Resilience Metrics and Measurement Frameworks
  • Development and Deployment of Secure Inter‐domain Routing
  • Research into AS Incentives that Improve Resilience
  • Promotion and Sharing of Good Practice on Internet Interconnections
  • Independent Testing of Equipment and Protocols
  • Conduct Regular Cyber Exercises on the Interconnection
  • Infrastructure
  • Transit Market Failure
  • Traffic Prioritisation
  • Greater Transparency – Towards a Resilience Certification Scheme

More information about the report can be found on the ENISA web site.