March 10, 2015 archive

TDYR 229 – Analysis of the HBO NOW DNSSEC Error That Caused Problems With Comcast And Google

TDYR 229 - Analysis of the HBO NOW DNSSEC Error That Caused Problems With Comcast And Google by Dan York

HBO NOW DNSSEC Misconfiguration Makes Site Unavailable From Comcast Networks (Fixed Now)

Wow! Talking about insanely bad timing…  yesterday at Apple’s big event, HBO announced “HBO NOW”, a new streaming service available for only $15/month that will give you access to all HBO’s content.  This was great news for those people who want to “cut the cord” and not have to pay for a cable TV subscription to get content such as this from HBO. All you had to do was go to order.hbonow.com to get started.

One slight problem – the folks at HBO had signed the hbonow.com domain with DNSSEC, but had not done so correctly!

As a result, the many networks around the world that perform DNSSEC validation to ensure that customers are getting to the correct sites (versus being redirected to bogus sites for phishing or malware) were blocking customers from getting to the possibly bogus order.hbonow.com!

In the text below, I will:

  • Explain what appears to have happened with HBO’s misconfiguration of the hbonow.com domain.
  • Provide two examples confirming what seems to have happened.
  • Speculate on why this occurred.
  • Offer some suggestions on what we need to do next.

Comcast (and Google’s Public DNS) Blocks HBOnow.com

One of those networks here in the USA performing DNSSEC validation was, of course, Comcast (and they have been doing so since January 2012), who is both the largest Internet Service Provider (ISP) in North America and also the largest cable TV provider.  They also own NBCUniversal and create their own video content. So of course the immediate reaction was for people to take to Twitter and blame Comcast:

But here’s the thing:

Comcast was CORRECT in blocking HBO’s site!

Because:

  1. The .COM top-level domain (TLD) had a DS record indicating that the hbonow.com site was signed with DNSSEC.
  2. The hbonow.com domain did NOT have the corresponding DNSKEY record.
  3. Comcast’s DNSSEC-validating DNS resolvers identified the problem and blocked access to the site on the assumption that this could have been an attacker attempting to redirect people to an unsigned and potentially bogus website.

DNSSEC worked correctly to prevent people from going to a bogus site.

Unfortunately, the DNS records were “bogus” not because of an attacker but rather because of a misconfiguration on HBO’s end.

This is not the first time Comcast has dealt with a site with misconfigured DNS records.  If you remember back to 2012 there was the issue with NASA.GOV, which turned out to be a problem with the changing of DNSSEC keys.  Comcast and NASA provided a detailed explanation of what happened then.  (And in another case of spectacularly bad timing, the outage occurred on the day of the SOPA/PIPA website protests, leading then to charges on Twitter that Comcast was deliberately blocking sites.)

UPDATE: I’ve had people tell me they also couldn’t get to the HBO NOW site on networks that use Google’s Public DNS Servers as their DNS resolvers – which makes sense because Google has been performing full DNSSEC validation on sites since May 2013.  (I just did not see anyone tweeting about this…)

HBO’s “Solution”

HBO has seemed to “fix” this issue by, unfortunately, simply removing DNSSEC records and returning the domain to a completely unsigned / unprotected state.   Once the incorrect DNS records age out of DNS resolver caches (based on their Time-To-Live (TTL)), or if the DNS resolver caches are flushed of the current records, then the domain will resolve correctly and people will be able to get to the site.

I’ll speculate on what happened in a moment… but here is some confirmation of what occurred.

Confirmation Using DNSViz

Last night on Twitter, Jason Livingood, VP of Internet Services for Comcast (and also, in full disclosure, a member of the Internet Society Board of Trustees as well as a long-time participant in IETF standards activities), tweeted out a DNSViz analysis of the order.hbonow.com domain (click/tap image for larger version):

DNSViz status of order.hbonow.comIt shows in there that there is a DS record for “hbonow.com” that points to a DNSKEY record with the id 51249.  However, that DNSKEY record does not exist in the actual DNS records for “hbonow.com”.

This is why the failure occurred.

When I look at DNSViz right now for the domain, the picture is different:

DNSViz of order.hbonow.comThere is no DS record for hbonow.com and so there is no DNSSEC failure.

Confirmation Using Dig

On my own home office network, I (of course!) use a DNSSEC-validating resolver and found myself unable to get to the order.hbonow.com site.  Last night when reviewing the news about the Apple event presumably somewhere in there my DNS resolver pulled the DNS records for hbonow.com (perhaps due to web browser “pre-fetching”) and so the old records are in my DNS resolver’s cache.  When I go to a command-line and type “dig +dnssec ds hbonow.com”, I get back the following:

;; QUESTION SECTION:
;hbonow.com. IN DS
;; ANSWER SECTION:
hbonow.com. 10697 IN DS 51249 7 1 90DC90D0578FCFDDF6ED5DE0B35E9652CD2396A8
hbonow.com. 10697 IN RRSIG DS 8 2 86400 20150315045041 20150308044041 13787 com. NAY+BNRi4c6rzLOyoFN4OPOGbbUFuDu/kfO37m00pKkSwXxhAa0qkTTQ HIvzeaFPY54hdJlqH1EzdUEDuL2Nz2stv7iQmsakBaHf3fjHpe2L9H4C Q+wk8yc1vmHdcaUhJyuWYalLwJqg8GWmCXUzWAc6JAoZTPOzF4yZkshp unE=

If you notice the “51249” in the first line of the “ANSWER” section, that matches up with what was shown in the first DNSViz image above as the ID of the DNSKEY that is missing.

When I connect into a system on another and perform the same dig command, I get a different response:

;; QUESTION SECTION:
;hbonow.com. IN ANY

There was no answer section, which means there is no DS record.  If I were on that network I would be able to get to the order.hbonow.com site.

The Time-To-Live (TTL) Issue

If you look back at those responses from my network, you will see the “10697” in the answer section.  This is the number of seconds that the record will remain in the cache of my DNSSEC resolver.  In the time it has taken me to write this post, that number is now down to “5224”  – so about 87 minutes left until that record ages out and my system will stop blocking access to the site.

In Comcast’s case, Jason tweeted out that they flushed the caches on their DNS resolvers and so people should have been able to get to the site right after that.   In my case, I logged into the web admin menu for my home server/gateway and clicked the button to flush the cache… but that didn’t seem to work (and so I’m going to be raising a ticket with the software distribution).

For others out there, they still may be unable to get to order.hbonow.com until the TTL expires in the cache of their DNS resolvers.

Speculation On What Happened

Judging from all of this, here is my guess as to what happened.

1. At some point, HBO signed the domain with DNSSEC. The name server (NS) records indicated that the authoritative DNS servers for hbonow.com are at Dyn’s “DynECT” Managed DNS Service.  DynECT provides a way to very easily sign your domain. I use this service myself for several of my personal domains.  You check a couple of boxes and Dyn takes care of all the signing and re-signing for you.  It has worked well for me.

2. A DS Record was uploaded to the .COM TLD. To tie the domain into the “global chain-of-trust” of DNSSEC, a DS record had to be uploaded to the .COM registry.  This DS record provides a fingerprint (a hash) of the DNSKEY record used to sign the domain.   Unfortunately, right now the only way to transmit a DS record to the .COM registry is through the registrar where you registered the domain name.  I know from personal experience that this involves a manual copy-and-paste of the DS Record from Dyn’s web management interface into my registrar’s web management interface.

Next I think either one of two things happened:

3a. The DNSKEY was updated (rolled over) but the DS record at the .COM registry was NOT updated.  Within DNSSEC, a Key Signing Key (KSK) is set to expire at a certain date and time.  Services operated by DNS operators like Dyn will automagically generate new keys and re-sign the zone. In doing so, the DNS operator will also generate a new DS record.  However, there is no automated way for a DNS operator to get the new DS record to the TLD registry.  (This is something the industry is discussing right now!)

Given that I’m not finding any other DNSSEC-signed records in the cache of my local DNS resolver, I think the answer may more likely be…

3b. HBO decided to remove DNSSEC signing, but forgot to remove the DS record from the .COM registry.  Someone at HBO may have decided to remove the DNSSEC signing from the domain.  Perhaps the signing was just an experiment by someone on the IT team.  Or perhaps someone decided to remove the DNSSEC signatures in advance of the launch so that there was one less variable during the huge launch with Apple.

But… in removing the DNSSEC signatures they forgot to remove the DS record from the .COM registry.

Again, this goes to the manual nature of this process.  Someone could have very easily gone into Dyn’s web management interface and un-checked the box for DNSSEC signing.  Easy. Simple. Done.

But they would also have to login to the registrar’s web management interface and remove the DS record. This may have been the step that someone forgot.

The problem here is that:

  • IF a DS record EXISTS in a TLD;
  • AND the corresponding DNSKEY record does NOT exist in a domain;
  • THEN an attacker could be trying to substitute in an unsigned set of DNS records.

This is exactly the kind of “attack” that DNSSEC is designed to prevent!

Unfortunately, HBO seems to have “attacked themselves” by missing a step in the operations of DNSSEC.

What Next?

This failure last night really speaks to the hole we have in the DNSSEC signing process where there is no easy way for a DNS Operator who is NOT the registrar to update the TLD registry with the new records.  The failure here is because of the manual cut-and-paste process that must be currently used.    I wrote about this in a post:

which in turn pointed over to Olafur Gudmundsson’s post on CloudFlare’s blog:

If we look at the steps of DNSSEC signing today, they look like:

DNSSEC Signing Steps

The challenge we have is to somehow improve the communication between the DNS Operators and the Registries.

In the HBO NOW case, if you go back to my speculation, either of two things could have happened with better automation:

1. The DNS Operator could have updated the registry with a new DS record. If the issue was my “3a” above where there was a key rollover without an update to the DS record, the DNS Operator (Dyn in this case) could have updated the .COM registry with the new DS record.

2. The DNS Operator could have signaled the registry to remove the DS record. If the issue was my “3b” where HBO turned off DNSSEC signing, the DNS operator (Dyn) could have signaled the registry to remove the DS record.

We need to fix this!

We need to have better automation here so that these kind of manual issues do not cause failures in the DNSSEC validation process.

If you’d like to help, there is a public mailing list set up for anyone who is interested.  You can join the effort and subscribe at:

https://elists.isoc.org/mailman/listinfo/dnssec-auto-ds

This work will be ongoing for quite some time and will probably wind up in the DNSOP Working Group within the IETF.  It’s a critically important challenge we need to address to bring further automation to DNSSEC deployment and help many more people secure their domains.

Meanwhile, we’ll have to wait to perhaps get some more official news out of Comcast and or HBO… but this appears to be what happened last night.

Comments, suggestions, feedback or disagreements with my analysis are all welcome as comments!

And… if you want to get started with DNSSEC, please do visit our Start Here page to begin.


Other potential discussions of this post: