You are on page 1of 1

Timilearning About

MIT 6.824: Lecture 18 - Certificate


Transparency
02 Dec 2020 · 6 min read

This lecture is about building systems out of mutually untrustworthy components—


using the Web as a case study. The systems we have seen so far are closed systems
for which we have assumed that all the participants are trustworthy. But in an open
system like the Web where anyone can take part, and there is no universally trusted
authority, trust and security are top-level issues to address.

A fundamental challenge in building open systems is verifying the identity of each


component involved. We can frame that challenge as each computer in the system
asking: Am I talking to the right computer?

Certificate Transparency (CT) aims to help answer this question, but before going into
CT, I'll give a brief tour of the evolution of security on the Web.

Without HTTPS (Man-in-the-middle attacks)


HTTPS
How HTTPS works
Certificate Authorities protect us...
...But they can go rogue
Can we have an online database of valid certificates?
Certificate Transparency
Certificate Logs
Monitors
Auditors
Everyone must see the same logs
Conclusion
Further Reading

Without HTTPS (Man-in-the-middle attacks) #


A man-in-the-middle attack happens when a third party intercepts a connection
between a user and an application, as illustrated in the figure below[1].

Originalconnection

NewConnection

Figure 1: Man-in-the-middle attack.

When an attacker intercepts an HTTP connection, they can read and change any
packets being sent over the network. These packets may contain any information
ranging from passwords to bank details to other private information that a user does
not intend for an intruder.

HTTPS #
HTTPS was invented to make communication over the internet more secure. It takes
the original HTTP protocol and adds a layer of security known as to as SSL/TLS. I'll
refer to this layer as TLS for the rest of the post.

With TLS (Transport Layer Security), you only send and receive encrypted data over
the network, and only a secret key agreed on by your computer and the site you're
visiting can decrypt this data. Thus, while an attacker can still intercept your HTTPS
connection, they cannot make sense of the transmitted packets.

How HTTPS works #

TLS is based on public-key cryptography, which means here that the server has a
public/private key pair. The server exposes the public key and keeps the private key
private. When a client encrypts data using the server's public key, only the private key
can decrypt it.

The first step in enabling HTTPS for a server is to get a certificate from a Certificate
Authority (CA). This certificate is an ID for the server that contains its domain name,
information about its owners, the server's public key, the CA's identity, and a digital
signature signed by the CA.

At a high level, when your browser connects to an HTTPS server:

The server responds with its certificate to prove its identity to the browser.

Your browser then checks the validity of this certificate using the digital signature
from the CA. I'll describe how this works soon.

After verifying the certificate, your browser will generate a random key and encrypt it
using the server's public key. It will then send this encrypted key to the server as a
challenge. The challenge is for the server to prove that it has the private key
equivalent for the public key by decrypting the encrypted message.

Once the server decrypts this random key, it means each party is happy that they are
talking to the right party and they agree to use the key for subsequent communication.

These steps make up the TLS handshake. After the handshake is complete, both
parties encrypt HTTP requests and responses using the key they agreed on, which
only the other party can decrypt.

Certificate Authorities protect us... #

Without the digital signature from a Certificate Authority, anyone can create a
certificate falsely claiming to be, say, 'netflix.com' and get your browser to trust them.
Note that your browser will trust them as long as the fake server can decrypt a
message it encrypts with the server's public key. To prevent this, there are few
authorized CAs which may issue certificates.

Like servers, CAs also have a public/private key pair. When a CA issues a certificate, it
encrypts the certificate's content with its private key and uses the encrypted text as
the digital signature for the certificate. Anyone can decrypt the signature using the
CA's public key.

Each browser comes with a pre-installed list of public keys of all CAs it trusts. When
your browser receives a server's certificate, it first checks its list for whether it trusts
the issuing CA before decrypting the digital signature using the CA's public key. If the
decrypted content matches the certificate, your browser is sure that a valid CA issued
the certificate and continues the TLS handshake.

...But they can go rogue #

Unfortunately, CAs can get compromised or go rogue and end up issuing "bogus"
certificates, i.e., a CA may issue the certificate for a domain name to the wrong owner.
This has happened before. Since any CA can issue a certificate for any domain name,
the least secure CA limits the overall security of the certificate mechanism.

Thus, while HTTPS can increase our confidence that we are talking to the right
computers, it is not enough.

Can we have an online database of valid certificates? #

To limit the effect of a bogus certificate, what would be ideal is if our browsers could
somehow detect and reject bogus certificates. One way this could work is if there is a
database of all the valid certificates in existence that our browsers could query. This
comes with several questions, though:

Given that there's no single authority that the entire world trusts, who would run this
database?
How do we decide who owns a domain name?
How do we handle situations where people change their CAs, renew their certificates
or lose their private key and have to request a new one? These will all look like a
second certificate for an existing domain name.

Certificate Transparency is an approach to answering these questions, and we'll look at


it next.

Certificate Transparency #
Certificate Transparency (CT) is a system for making the existence of all certificates
publicly available to domain owners, CAs, and browsers. This way, when a rogue CA
issues a certificate for 'netflix.com' to the wrong person, the certificate is immediately
visible to the right owners for them to act on it.

CT works by introducing three components to the certificate system: certificate logs,


monitors, and auditors.

Certificate Logs #

Certificate logs contain an append-only record of certificates. Anyone can add a


certificate to the logs, though only certificate authorities do this typically. When a
server issues a new certificate, it must add it to the logs. Anyone can also query a log
to verify that it contains a certificate.

Certificate logs are hosted on a group of servers spread over the world and can be
managed independently by a CA or any interested party.

Monitors #

Monitors are servers that periodically check the log servers for if a CA has issued any
suspicious certificates for the domain names they are aware of. They are hosted by
organizations which manage a set of domain names. For example, a company like
Netflix could host their monitors and run periodic checks on the certificate logs for if
any suspicious certificates exist 'netflix.com'.

When a monitor detects a suspicious certificate, I believe there is a manual step


involved where a human checks whether the certificate is actually OK or was wrongly
issued. There is a certificate revocation process to get rid of bad certificates.

Auditors #

An auditor runs in a web browser and checks whether the certificate it receives from a
server has been registered in the certificate logs.

These components work together to bring openness to the SSL certificate system.
Quoting the lecture notes:

If browsers and monitors see the same log, and monitors raise an alarm if
there's a bogus cert in the log, and browsers require that each cert they use is
in the log, then browsers can feel safe using any cert that's in the log.

Note that auditors and monitors also communicate with each other to exchange
information about the logs.

Everyone must see the same logs #

For certificate transparency to work, there are two critical requirements for the logs:

No deletion: This requirement prevents a situation where a log server claims that a
bogus certificate is in the log and shows it to the browser, but then the log operator
deletes it from the log before the monitor can detect it.
No equivocation: All parties must see the same log content, otherwise, a log server
could show browsers a log with the bogus certificate, and show the monitor a log
without the certificate.

With these requirements, if a CA issues a bogus certificate for a domain name, it must
add the certificate to the log. And since a log operator can't delete it, the domain
name's owner will eventually see it. But meeting this requirement is difficult because,
like CAs, log operators can also get compromised and may even conspire with
malicious CAs.

To show that it isn't violating any of the requirements, a certificate log must be able to
prove two things:

 That a particular certificate is in the log.


 That if it is showing a version with new certificates added, that version is consistent
with the previous version. Proving this confirms that a log operator has modified no
certificates in the log and the log has never been branched or forked.

It does this by storing the certificates in a Merkle Tree data structure. I won't go into
the details of that here, but I recommend reading this post if you're interested in that.

Conclusion #
The key property of Certificate Transparency is that everyone sees the same logs. By
doing this, users can detect when CAs have issued bogus certificates for their domain
name, and browsers can be confident that any certificates in the log are approved by
their owners—which means they are talking to the right servers.

Finally, note that Certificate Transparency does not completely prevent the effect of
bogus certificates. There might be a window where the certificates may dupe a
browser before the monitors can detect them. What CT offers, though, is a system for
quicker detection of these certificates, which will reduce their effect.

[1]: Image lifted from this post by Imperva.

Further Reading #
Lecture 18: Certificate Transparency - MIT 6.824 lecture notes
Certificate Transparency FAQ - Additional material from 6.824
Executing a Man-in-the-Middle Attack in just 15 Minutes by Patrick Nohe
How does SSL actually work? by Robert Heaton
What is Certificate Transparency?
How Certificate Transparency Works
Transparent Logs for Skeptical Clients
How Log Proofs Work

mit-6.824 distributed-systems learning-diary

A small favour
Did you find anything I wrote confusing, outdated, or incorrect? Please
let me know by writing a few words below.

Your name

Your email address

What should I know?

Send Message

Follow along
To get notified when I write something new, you can subscribe to the
RSS feed or enter your email below.

Your email address Subscribe

← Home

You might also like