I broke authentication, but it’s not my fault.
How this came about
This weekend I was trying to login to Matrix (which uses OpenLDAP as its password store) on a new device and it was failing. Looking into the logs, it was complaining about an expired TLS certificate. Weird. First, the certificate was set up with cert-manager to renew the certificate automatically with Let’s Encrypt. Second, the certificate had been expired for a year and Synapse never complained about it before.
It shouldn’t be surprising that I made a few mistakes initially setting things up since it was one of the first services I set up on the Kubernetes cluster. I believe the Let’s Encrypt DNS challenge was failing because the Cloudflare API key was in the default namespace and not in the same namespace (ldap) as the certificate request. How it worked in the first place, I don’t know.
Another problem was that cert-manager had gone through a lot of version changes from 0.14.0 to 1.5.0 which included a change from v1beta1 API to v1 API. The configuration would need to be updated regardless.
Fixing the Cert
After making the Cloudflare API key available in the right namespace, I created an updated the cert issuer:
apiVersion: v1
items:
- apiVersion: cert-manager.io/v1
kind: Issuer
name: letsencrypt-ldap
namespace: ldap
spec:
acme:
email: e-mail
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
cloudflare:
apiKeySecretRef:
key: apikey
name: cloudflare-apikey-secret
email: cloudflare-email
selector:
dnsNames:
- ldap.domain.tld
and requested a new certificate:
apiVersion: cert-manager.io/v1
kind: Certificate
name: ldap-cert-2021
namespace: ldap
spec:
commonName: ldap.domain.tld
dnsNames:
- ldap.domain.tld
issuerRef:
name: letsencrypt-ldap
privateKey:
algorithm: RSA
size: 4096
secretName: ldap-tls
The certificate was successfully issued and stored in a secret called ldap-tls with a tls.key and a tls.crt key, but OpenLDAP would not start now and was complaining about missing file ca.crt.
Investigating OpenLDAP
While investigating why this was occurring (and didn’t happen with the old, expired cert), I discovered some details about how OpenLDAP is configured:
- The TLS secret is mounted as a volume in a sidecar called openldap-init-tls at /tls. The two keys tls.key and tls.crt appear as files in the volume.
- There is an ephemeral in-memory volume called certs mounted as /certs.
- When openldap-init-tls runs it copies /tls to /certs so that the running openldap container is only operating on a copy.
Looking at the logs in OpenLDAP, it was also looking for ca.crt which contains the certificate authority’s certificate. Using the ACME protocol, the entire certicate is chain is contained in tls.crt. However, OpenLDAP wants a separate ca.crt because this is used if you are using a custom CA. For example, you may want clients to present a valid cert signed by your CA.
The normal solution to this would be to extract the CA certificate and save it separately as a separate key (file) in the secret (volume). However, this would be a manual step which may need to be done again everytime the certiciate renews which defeats the purpose of using cert-manager.
The Solution
Once I was able to finally able to understand what the problem really was, I did find that I was not the only one having the issue. I don’t quite understand the reason, but cert-manager does not to add the ca.crt key to Let’s Encrypt certificate secrets. However, this is an improvement over adding an empty ca.crt key as described in the issue.
Fortunately, since the CA certificate can be extracted from the certificate chain so someone wrote a very handy operator which does just that.
it was a very simple installation and now the ca.crt key exists in the secret. OpenLDAP is able start with all of the information it needs about its TLS certificate.
One Final Problem
In spite of fixing “the problem”, I did hit one other snag in that the certificate was created with the external DNS name. I did not specify any Subject Alternative Names (SAN) which match the cluster address of OpenLDAP and so Synapse was still complaining about the TLS certificate not matching.
I may still fix that at some point, but I found that it was simpler to use the standard, non-TLS port since there are no connections allowed from outside of the cluster. Both Keycloak and Synapse work without issue now.
Hail To The Documentation
There’s one primary reason I keep this blog updated. Going back and referencing how I did something has been a huge help when things break. Now that I’ve learned something new, I can reference the update as well when it invevitably breaks again.
Have you run into this (or a similar) issue before? Join the Matrix room and let’s talk about your experience.