As much as I love public key infrastructure (PKI) and the mathematical security it can provide, it's usually horribly implemented in the real world.
If done right, like the inventors intended, it would be darn near perfect. It's mostly broken because admins don't deploy it right, software doesn't enforce what needs to be enforced, and users bypass any PKI warning, resulting in untold downloads of who knows how much malware.
To most users, one of the biggest problems with PKI is invisible: the broken certificate revocation process. Digital certificates are supposed to be revoked when their private keys become compromised or for some other reason shouldn't be trusted or used, as determined by the certification authority (CA) that authorized the certificate. A revoked certificate is supposed to be the same as no certificate.
But what actually happens is that most CA admins never revoke certificates, even when they should. Alternatively, people keep using revoked certificates and no one notices. Even more common, the certificate gets revoked, but the software (or the user) doesn't bother to check. Worse still, the software looks to see if the certificate is revoked, can't validate it one way or another, then fails open as if the certificate is good and valid (when the opposite is supposed to happen).
A recent revocation paper published by the University of Maryland offers excellent data points and conclusions. Here are some highlights:
- Eight percent of the certificates on the Internet scanned over the life of the survey have been revoked.
- Nearly 1 percent of revoked certificates are still actively used.
- There was a huge jump in revoked certs due to Heartbleed.
- Based on the authors' examination of the revocation-checking behavior of 30 different combinations of Web browsers and operating systems, browsers frequently do not check whether certificates are revoked -- and mobile browsers never check.
- The median certificate revocation list size was 51KB; the max size was 76MB -- quite a variation!
- "Overall, our results paint a bleak picture of the ability to effectively revoke certificates today."
The research paper skewers website administrators and browser vendors in equal measure. Both deserve it.
Another great collection of PKI irony is maintained by Google researcher Adrienne Felt. Much of her focus is on PKI usability and how users interact with browser PKI and malware warnings. Here are some of my favorite quotes from her research papers:
- "An ideal SSL warning would empower users to make informed decisions and, failing that, guide confused users to safety. Unfortunately, users struggle to understand and often disregard real SSL warnings."
- "During our field study, users continued through a tenth of Mozilla Firefox's malware and phishing warnings, a quarter of Google Chrome's malware and phishing warnings, and a third of Mozilla Firefox's SSL warnings. "
- "In Google Chrome, users click through a fifth of malware warnings on average."
- "We also find that user behavior varies across warnings. In contrast to the other warnings, users continued through 70.2 percent of Google Chrome's SSL warnings."
- "We show that warning design can drive users toward safer decisions."
It's good to know that researchers and software designers aren't entirely blaming users for making bad choices, although why people ignore repeated warnings to access a malicious website or file is beyond me.
Most people think that public CAs, especially the big ones that have been around for decades, are supertrustworthy and reliable. Our software trusts the certificates they issue by default, and we often pay a premium for the certificates and services we buy from them. But even public PKI vendors have their issues.
When you're a PKI vendor, your No. 1 duty is to verify the identity of the subjects requesting particular certificates so that other consumers of those certificates can trust that they're from who it says they're from. Unfortunately, we've had bad and compromised public CAs for a long time now. I first reported some of the early CA compromises in 2011. After many years of many public debacles, you'd think all public CAs would understand their importance to the process and implement strict controls and security. Apparently there are a few stragglers.
Symantec (which took over VeriSign's massive public PKI business) pissed off Google -- and it came to a head last week. Not only did Symantec employees create and accidentally release "test" Google.com (and other big brand) certificates, but it also released questionable, supposedly more secure EV (extended validation) certs. The company also released certificates to almost 2,500 Web domains that were never registered. How does that even happen?
There's no excuse other than poor controls and enforcement. To be fair, Symantec fired some of the involved employees and will certainly improve its processes. It has to. Google, unfairly or not, is threatening to untrust all Symantec-issued certificates if the company does not resolve its problems and undergo a PKI audit. It's pretty damning information and a pretty big threat. I'm still not sure who I feel for more, but as a security person, I guess I have to fall in the camp of the team trying to improve user safety.
Actually, my biggest problem with revocation isn't with public CAs. It's with privately revoked certificates. Most of my customers run private PKIs to authenticate users and computer devices in their environment. More and more, these privately issued certificates are used as the ultimate attestation of whether a user or computer should be on the network and able to access critical resources.
Part of relying on a private PKI process is being able to rely on the revocation process. If employees or devices get separated from the business, organizations want to be able to revoke their certificates and be assured that those users and computers can't get back on the network.
The current standard revocation processes (involving CRLs and the Online Certificate Status Protocol, or OCSP) have way too much latency built in. On top of that, most clients cache any previous revocation checks for the lifetime of the CRL, which means that, practically, when an organization revokes a cert, it can be up to a day or longer before the relevant software notices (assuming the software even looks, which it often doesn't).
PKI consumers want real-time revocation. They want a PKI admin to revoke a cert -- and to know immediately that the cert is bad and can't be relied upon. This doesn't happen much in the real world, at least not in a timely manner. Unfortunately, many private PKI admins don't know that. They think once they revoke a cert, that cert can't be used any longer. But who can blame them? They're doing what they've long been told works. It doesn't.
Pushing toward a solution
I thought about creating a real-time revocation RFC (request for comments), but I found out that the PKI-related committee that considers such standards for the Internet was disbanded. What? PKI is taking off in a huge way both privately and publicly, and the international standards body that discusses and implements new PKI rules no longer meets?!?
Several vendors, including Microsoft and Google, have implemented privately maintained and issued revocation lists and functionality. Microsoft installs its main one through normal update processes. It's available for any application running on Microsoft Windows, and it mostly still conforms to the traditional, standard revocation processes. Also, for major sites, Microsoft checks to see if the certificate you're using to connect to a particular site is the legitimate one if you use its free Enhanced Mitigation Experience Toolkit.
Google links its private revocation list to its Chrome browser product. Other vendors have their proprietary lists, too. Rarely do these private revocation lists agree with each other. That's a problem.
Google has even invented an entirely new revocation process, called Certificate Transparency, and is trying to push it as an open standard. The company even has an industry luminary, Ben Laurie -- everyone else I know admires and trusts him -- pushing for it. There's a lot to love about the attempt.
Unfortunately, it's mostly only used by Google for now. I'm pretty sure it's not the answer, because it requires that all issuing CAs actively push revoked certificates to other participants. Traditional revocation models have each consuming application pull from individual lists distributed by each authoritative issuing CA (much like DNS works).
In today's computer security world, distributed, decentralized systems where clients pull and aren't forced to push seem to work better at scale. Hence, Google's own certificate transparency logs only have 0.35 percent participation at this point. (You can find other criticisms of the certificate transparency model here and here.)
I'm not sure an entirely new model of revocation is needed. If we simply enforce the existing standards with CRLs and OCSP (including OCSP stapling), most of the issues would be resolved. Get rid of CRL and OCSP caching, or at least shorten the maximum cache lifetime to a few minutes, and I think we'd solve 99 percent of the problems.
We don't need to reinvent revocation, only repair it a little. Fixing an existing process on the Internet is likely to be far more successful than going back to the drawing board.