A mysterious email problem - solved

Two years ago I heard from a friend that they wanted to send an email to my wife, but the email never went through. Instead the friend received an automated message from their email provider (Gmail) informing them that the email delivery had permanently failed. The automated message contained this error information:

TLS Negotiation failed: FAILED_PRECONDITION: starttls error (71):
6945907163592:error:10000417:SSL routines:
OPENSSL_internal:SSLV3_ALERT_ILLEGAL_PARAMETER:
third_party/openssl/boringssl/src/ssl/tls_record.cc:594:SSL alert number 47

I had never seen something like that before, also none of the contacts that are sending us private or business messages on a regular basis had ever reported any problems with sending us emails. I did some tests, such as checking the certificates being used by the Exim MTA on my dedicated Linux server (they were in order), but also sending myself an email from my own Gmail account (the email arrived). So nothing seemed amiss.

I then tried to find out more about the root cause by searching the net for keywords from the error message above, but was ultimately unsuccessful. As a side effect I found out the following about BoringSSL from their website:

BoringSSL is a fork of OpenSSL that is designed to meet Google’s needs.

So at this point it looked like this was a) a Gmail-specific issue that b) mysteriously affected only our friend (remember: I had been able to send myself emails from my own Gmail account). Lacking the time for further investigation, I dropped the case for the moment, after which it lay dormant for the next two years - until I was jolted by its reoccurrence a few days ago!

What happened was this: I tried to forward an email from my own Gmail account to myself, but the delivery failed with an error message almost identical to the one that our friend had seen two years ago:

TLS Negotiation failed: FAILED_PRECONDITION: starttls error (71):
123358874730944:error:10000417:SSL routines:
OPENSSL_internal:SSLV3_ALERT_ILLEGAL_PARAMETER:
third_party/openssl/boringssl/src/ssl/tls_record.cc:592:SSL alert number 47

Two years ago I had been able to send myself an email from my own Gmail account, and now it “suddenly” did not work anymore - what the heck!? This time I was determined to find the root cause so - unlike two years ago - I immediately consulted the logs on the server where the receiving MTA runs, and I found this in the MTA log files:

2024-08-31 19:48:22 TLS error on connection from mail-lj1-f174.google.com
[209.85.208.174] I=[82.195.228.21]:25 (gnutls_handshake):
A disallowed SNI server name has been received.

So at least I now had a better idea what this was all about: Either the Gmail server or my own server for some reason had used a wrong server name in the TLS handshake, but I still did not know which of the participants it was, nor what name it had used.

The following net search then yielded the information what “SSL alert number 47” means - two years ago my search-fu apparently had not been good enough yet to reveal this. So the term “SSL alert” comes from the “SSL/TLS Alert Protocol”. The error codes are defined in the TLS/SSL RFCs, for example in RFC 8446 for TLS 1.3. In this RFC one can see that number 47 is the alert with the symbolic name “illegal_parameter”. A bit further down, in section 2, the alert is described like this:

   illegal_parameter:  A field in the handshake was incorrect or
      inconsistent with other fields.  This alert is used for errors
      which conform to the formal protocol syntax but are otherwise
      incorrect.

Searching the RFC for more occurrences of “illegal_parameter” resulted in various cases where the RFC says that the client or server (depending on the scenario) MUST abort the handshake with “illegal_parameter”. Unfortunately, none of the cases in the RFC fits the “disallowed SNI server name” case that I was seeing, so even though I had now a better grasp of the Gmail-side error message my quest for the exact nature of the root cause had not progressed substantially.

I then started to think how I could find out which SNI server name had been used. After some consideration, it seemed more logical that this was about the SNI server name of my server, i.e. the Gmail server had used a certain invalid SNI server name in the handshake that it expected to refer to my server.

So where would the Gmail server get this name from? The only place I could think of is the MX record of the domain of the email message recipient. I had a look with this:

$ dig moser-naef.ch mx

and got this result:

[...]
;; ANSWER SECTION:
moser-naef.ch.  300  IN  MX  100  _dc-mx.7c74f8d3197f.moser-naef.ch.
[...]

This was indeed a very, very weird server name! Where the heck did this thing come from? Another net search led me to this page which suggested that the problem is caused by Cloudflare being configured to proxy for the actual mail server. Guess what? I am using Cloudflare for DNS hosting!

Checking the DNS setup for moser-naef.ch on Cloudflare revealed that the MX records were set to “DNS Only” (which is correct), but the A record was set to “Proxied”. Comparing with the configuration of herzbube.ch (another domain of mine), I saw that the A record for herzbube.ch was set to “DNS only”. As can be expected, after changing the A record for moser-naef.ch to “DNS only” I was then able to successfully send emails from Gmail to moser-naef.ch recipient. Problem solved.

After returning from my brief sojourn to Revelation Space, three mysteries were left:

  • Why had I been able to send myself an email from my own Gmail account two years ago, but now not anymore? The only explanation I have for this is that two years ago I had been sloppy and had sent my test mails to a recipient in the herzbube.ch domain instead of in the moser-naef.ch domain. Although the theory is plausible, any proof for it is long gone.
  • How did the Cloudflare configuration of the A record for moser-naef.ch become set to “Proxied”? I do not have any recollection of doing so myself, especially since I did not even know about this configuration option before today. Is “Proxied” a default value and I did not pay attention when I set up the moser-naef.ch domain? This seems unlikely because I set up all four domains that I own at the same time, and only moser-naef.ch was mis-configured. The mystery remains.
  • Last, but definitely not least, why was only Gmail having problems delivering mails to moser-naef.ch, but nobody else in the world? For this I still do not have an answer. Is Gmail, or rather Google, more diligent than everybody else when it comes to following internet standards? On the other hand, Cloudflare also is not exactly a small company that can afford to be sloppy with configuring its network infrastructure. Here as well, the mystery remains.