What you see on the Internet cannot be trusted.
This may seem like an understatement in the age of disinformation, deep fakes, and AI-generated slop, but on a deeper level, the internet is insecure by design. The internet can connect anyone, anywhere to anyone else, anywhere else in the world. While this comes with many benefits, it also comes with inherent risk, since many of those people using the internet have malicious intent. Internet users are at risk from an almost endless number of attack vectors, and browsing the web can be dangerous. Users can be tricked into clicking malicious links that lead to phony websites, often imitating real ones. They then end up entering private information into the phony websites or even downloading malware. When the user types in the name of a website, say for example their bank’s or school’s website, how do they know for sure what they are connecting to is the real thing?
🤝 Establishing Trust: Certificates
Establishing trust in the untrusted web is the core challenge in online security. Websites need a way to validate their authenticity to users who want to use them, and end users need to know they can trust the websites they use.
Often, when you purchase valuable or rare items like artwork, they will come with a certificate of authenticity. This certificate will include information about the item, as well as a signature from its creator (in the case of art) or some other trustworthy authority, attesting to its authenticity.
Something similar is used to verify a website’s authenticity. Certificates, signed by trusted certificate authorities, are issued to websites to prove the site is genuine. Most certificates follow the X.509 format, and include certain information about the domain they are authenticating. X.509 certificates divide the domain information into fields, including:
- Serial Number: Unique identifier of the certificate
- Issuer: The Authority that issued and signed the certificate
- Validity period: Essentially an expiration date for the certificate
- Subject: This field actually contains several sub fields, including the Common Name – the domain name this certificate is issued to – as well as the name of the organization that owns the domain
- Signature algorithm: The algorithm used to sign the certificate (more on that later)
- Subject Alternate Name (SAN): Alternate domain names or subdomains that this certificate can apply to
- A digital signature
The certificate is included with the web page when it is loaded by a browser, and the browser uses this certificate to make sure the website is genuine.
To understand how these certificates work, let’s cover a little cryptography.
🔐 Cryptography, hashes, and digital signatures
If you have spent any amount of time studying cybersecurity, you have at least heard the terms “symmetric” and “asymmetric” encryption. To give a brief overview, symmetric encryption uses the same encryption key to encrypt and to decrypt data, whereas asymmetric encryption uses a pair of keys, a public key and a private key. The public key is shared with other people, and the private key is kept under tight guard. Data encrypted with one key can only be decrypted with its sister key. So long as the owner of the key pair keeps the private key secure, this process can be a good way of validating someone’s identity. Asymmetric cryptography, such as the RSA encryption system, is the foundation of digital certificates.
Another essential component of certificates is hashing. A hash is like a unique fingerprint for data. Any kind of data, whether it’s a large file or a single letter, when run through a hashing algorithm, will produce a unique string of text called a hash. These hashes are unique, always the same length, and change if the original data changes even slightly. For example, using the SHA-256 hashing algorithm with this website’s name (BlueTopazSec) produces the hash “922b872c43e658f4188df002f17600ba64c895f21415d2494254213b2ff3d864”. However, simply making the “B” in the name lowercase will change the resulting hash entirely: “f620c4aa7273bf750d07c36dcfe3aa553ddd4f19daabd2146ae93d8248dc85c7”. This makes hashes a fantastic way to check if data has been tampered with.
When we combine hashing and asymmetric encryption, we can produce what is called a digital signature. A digital signature is essentially a hash that is encrypted with a private key. In the case of web certificates, a hash of the certificate is encrypted with a private key, and the resulting signature is included in the certificate, along with the public key. When loading the website, the browser uses the public key in the certificate to decrypt the hash. It then produces a hash of the certificate and compares the one it made with the one it received. If the two match, the user’s browser knows that not only has the certificate not been tampered with, but that it was actually sent from the owner of the private key.
However, anyone can create and sign a certificate for their website, including malicious ones. To actually provide trust, the digital signature which signs the certificate must come from a trusted third party. This is where Certificate Authorities come into play.
📝 Certificate Authorities and Public Key Infrastructure
A Certificate Authority (CA) is a trusted entity, usually a company, that issues signed certificates to people or organizations who request them. There are a number of public certificate authorities which are trusted by web browsers, which have the CA’s own certificate and public key pre-installed to verify their signatures. This system is known as Public Key Infrastructure, or PKI.
When someone (a person or organization) wants to secure a website with a certificate, they make a request to a known public CA. Their requests will include the domain name of the website they want signed, as well as the website’s public key. Once the requestor’s identity is verified, the CA provides a certificate, signed with the CA’s private key. This certificate contains information about the domain, the signing CA, an expiration date, the domain’s public key, and the digital signature produced with the CA’s private key.
Because the certificate contains the website’s public key, and is signed with the CA’s private key, it is virtually impossible for this certificate to be forged or stolen. Anyone trying to impersonate a site, say bluetopazsec.com, would not be able to use the genuine certificate on their fake website, because they do not have the private key that accompanies the public key in the certificate. The public key in the certificate cannot be changed, because then the certificate would not match the digital signature issued by the CA, and web browsers would reject the certificate and block the connection.
✉️ Protecting data in transit: Transport Layer Security
This process is very effective at verifying a website’s identity. However, this does nothing to protect information passing from user to website from eavesdropping. The connection between client and server should be encrypted. Asymmetric encryption works for validating the holder of the private key and creating digital signatures. However, it is less useful for encrypting large amounts of data. For one thing, it is slower and more computationally intensive than symmetric encryption. Using asymmetric encryption to encrypt all web traffic would slow down the connection considerably. Also, because the website’s public key is included in the site’s certificate, anyone can decrypt messages from the website. Because of these issues, symmetric encryption algorithms are used to encrypt the actual data travelling between client and server.
However, this raises the problem of how to share keys between the server (the website) and the client (web browser). Both must have the same key to encrypt and decrypt messages between them, and this key must be kept secure: only the clent and the server should have this key.
Transport Layer Security (TLS, also sometimes called by its older name SSL) is a protocol for establishing encrypted sessions between a client and server. When a user connects to a website with TLS enabled, a series of steps take place, known as the TLS Handshake. Here is how a TLS handshake is performed using RSA key exchange:
- After establishing an initial TCP Connection to the web server, the client sends what’s called a TLS Client Hello message to the server, which includes the encryption algorithms the client supports (called cipher suites), and a randomly generated number called the Client Random.
- The server responds with a Server Hello message. This message includes the site’s X.509 certificate, as well as its choice of cipher suites from those the client supports, and its own random number, the Server Random.
- After validating the certificate, the client sends another random number, called the Premaster Secret. The client encrypts the Premaster Secret with the server’s public key, which was contained in the certificate. That way, only the server will be able to access this Secret.
- The server decrypts the premaster secret with its private key, and the client and server generate encryption keys using the client random, server random, and premaster secret. Since both parties are working with the same numbers, they end up with the same key. This key is unguessable to outsiders, because it was made with the premaster secret, which was encrypted before it was sent.
These steps were how TLS used certificates, asymmetric, and symmetric encryption to secure communications in older versions of TLS. There is a flaw with this process, however: the pre-master secret the symmetric keys are derived from is sent over the internet encrypted with the server’s public key. This is secure, so long as the server’s private key stays safe. However, if the private key is ever stolen, an attacker could use it to decrypt recorded TLS messages. Using the stolen private key, the attacker could decrypt the pre-master secret sent over TLS, generate the symmetric keys, and access all encrypted data after that point. So the security of the web traffic is only as secure as the private key.
🔑 Diffie-Hellman key exchange
What was needed was a way to keep encrypted traffic secure even if private keys were compromised. Newer versions of TLS utilize a key exchange algorithm called Diffie-Hellman to securely generate symmetric keys.
Diffie-Hellman works by performing mathematical operations on a pair of secret numbers and one public number. During a key exchange, the client sends the server two numbers. Both client and server then generate a secret random number, and perform an irreversible mathematical operation on the public numbers with their own secret number. The client and server then share the results of this operation with each other, and perform another operation using these results and their secret number. If you want a deeper explanation of the math involved, this is a fantastic article explaining it in more detail.
Because the math operations are irreversible, there is no way to calculate what the secret number was based on the results. With this process, it is impossible for an outsider to guess what the keys will be. At the end, both client and server will end up with the same number, which will be used as the basis of a symmetric encryption key. This is a far more secure way of generating keys. In fact, the latest version of TLS (TLS 1.3) requires Diffie-Hellman be used, instead of the older RSA method. Ephemeral Diffie-Hellman is an even more secure version, in which the client and server generate new secret numbers for key generation for every communication, and the old keys are discarded. This provides what’s called Forward Secrecy, meaning that keys that could be used to compromise data are not stored for the long term.
Conclusion
I hope this post was informative. PKI and TLS are fundamental to security, and have gone a long way to making the web safer for everyone.