/~r/sysadmin/ Have you tried turning it on and off again?

Google Cloud Global HTTP Load Balancer with Let's Encrypt

Here’s a brief guide on setting up Let’s Encrypt autorenewal with Google Cloud Platform’s Global HTTP Load Balancer product. I recently set this up for my own website, because of the Comodo-Let’s Encrypt snafu a few weeks ago. Previously, my website used a wildcard certificate from Comodo. But I wanted to eliminate the need to manually renew and install a fresh certificate every year (and also cut a hundred buck-a-roos from my annual web hosting costs).

Overview

Most clients for Let’s Encrypt expect to run on a single server, which is responsible for both (1) SSL termination, and (2) web traffic. (We’re using the “http-01” ACME validation method in this guide, but there are other options.) For simple websites with just a single server, it would be no problem to run both SSL termination and web traffic from the same server. But for many public cloud customers, it’s not so simple.

In order to accomplish our goal of automatic renewal of Let’s Encrypt certificates, we need to:

  • Make sure that only 1 server attempts the renewal. Let’s Encrypt has some rate limiting, and if every instance in your fleet attempts renewal, it’d both be wasteful and error-prone.
  • Direct web traffic for /.well-known/acme-challenge/... to our renewal server.
  • Propagate the newly-signed certificate to all places that perform TLS termination.
  • Schedule this process to be repeated before the 90-day certificate expires.

Additionally, it would be useful if we could:

  • Notify the sysadmin when a certificate is successfully renewed
  • Alert the sysadmin if the public-facing certificate is about to expire

The second point is important, because our autorenewal schedule might fail for any reason, and the sysadmin needs some time to fix the issue before it causes a public-facing problem.

Capturing traffic to .well-known

In order to satisfy http-01, you’ll need to capture traffic to /.well-known/acme-challenge/... on all SAN names for which you’re requesting a certificate for. If you’re using Google’s Global HTTP Load Balancing, you can set this up in a URL map. But I just added a little bit of code to each of my NGINX server blocks:

location /.well-known {
  root /srv/acme;
  try_files $uri =404;
}

If you have default cache header settings for static files, it’d be a good idea to turn those off in this block.

Generate a private key and CSR

Normally, your Let’s Encrypt client would take care of this step. But you can do this however you want. Here’s some Python code that’ll do the trick:

import OpenSSL
import os

DOMAINS = ["rogerhub.com", "www.rogerhub.com"]  # etc
key_path = "..."
csr_path = "..."

# Create a new private key and certificate signing request
private_key = OpenSSL.crypto.PKey()
private_key.generate_key(OpenSSL.crypto.TYPE_RSA, 2048)
x509_req = OpenSSL.crypto.X509Req()
x509_req.set_version(3)
x509_req.set_pubkey(private_key)

# Prepare x509 request subject
x509_req.get_subject().CN = DOMAINS[0]

# Prepare x509 SAN
x509_req_san = OpenSSL.crypto.X509Extension(
    "subjectAltName",
    critical=False,
    value=",".join(["DNS: %s" % domain for domain in DOMAINS])
)
x509_req.add_extensions([x509_req_san])

# Self-sign CSR
x509_req.sign(private_key, "sha256")

# Protect the private key
previous_umask = os.umask(077)
with open(key_path, "w") as key_file:
    key_file.write(OpenSSL.crypto.dump_privatekey(OpenSSL.crypto.FILETYPE_PEM, private_key))
os.umask(previous_umask)

with open(csr_path, "w") as csr_file:
    csr_file.write(OpenSSL.crypto.dump_certificate_request(OpenSSL.crypto.FILETYPE_PEM, x509_req))

Most of the details in your CSR don’t really matter. Let’s Encrypt will ignore your values anyway. You just need to make sure that:

  • The CSR is signed with the same private key that’s identified in the public key fingerprint
  • Your Subject Alternative Name is correct

If this is the first time you’re using Let’s Encrypt, you’ll also need to create an RSA keypair and register your email address with them.

ACME protocol

I’m using this implementation of the ACME protocol. It requires a web root (I use the /srv/acme that’s in my NGINX config from before), your CSR, and your Let’s Encrypt account’s private key (NOT the one for the certificate itself). If it succeeds, it’ll give you your freshly-signed certificate in PEM format.

Linking the certificate chain

Once you have your Let’s Encrypt certificate, you need to append the Let’s Encrypt X3 intermediate certificate to finish the chain of trust. Make sure you grab the intermediate certificate, or else your certificate won’t be trusted by clients.

Certificates on Google Cloud

On Google’s Global HTTP Load Balancer, each HTTPS target proxy is linked to a certificate. You can use the gcloud tool (pre-installed on GCE images) to update your target proxies with new certificates. But make sure that:

  • Your version of gcloud is relatively recent. The target-https-proxies commands were only recently added, so they aren’t present in older versions of gcloud. (Not to be confused with target-http-proxies, which manges plaintext HTTP sites)
  • Your renewal server has API access to Compute Engine resources. You can configure this in your GCE templates, or on a instance-by-instance basis.

You can query gcloud for the current certificate, to check if it’s about to expire. If it is, you can upload your new certificate, then use the target-https-proxies update command to switch over to the new certificate. You won’t see the changes immediately, but soon, the renewed certificate should be installed globally.

There’s a usage quota of 30 SSL certificates (at least on my account). But if you aren’t too aggressive with your renewals, you won’t have any problems even if your renewal script doesn’t clean out expired certificates. It’d be a good idea to keep around at least 1 old certificate, just in case something goes wrong and you need to revert.

Monitoring

Here’s a bit of Python to retrieve a website’s certificate expiration date from an external monitoring server:

import datetime
import socket
import ssl

def get_cert_expiration(hostname):
    ssl_date_fmt = r'%b %d %H:%M:%S %Y %Z'
    context = ssl.create_default_context()
    conn = context.wrap_socket(
        socket.socket(socket.AF_INET),
        server_hostname=hostname,
    )
    conn.settimeout(3.0)
    conn.connect((hostname, 443))
    ssl_info = conn.getpeercert()
    conn.close()
    return datetime.datetime.strptime(ssl_info['notAfter'], ssl_date_fmt)

Just plug this into your preferred alerting system and you’ll have some additional assurance that your autorenewal is working.

Summary

It has been a few weeks since RogerHub.com switched to Let’s Encrypt certificates. Instead of a wildcard certificate, I’m forced to enumerate all my subdomains. But they all redirect to rogerhub.com anyway, so no big deal. It’s incredible that free, automated certificate renewal was just a pipe dream a few years ago. As more of the web turns on TLS, who will standardize captive portals? (many of which will become horribly broken)