FAQ: Why proper TLS Certificates for Email Ports matter

Autor: Oliver Schranz | CTO

TLS certificates are ubiquitous in modern communication. We most commonly encounter them when visiting webpages, although we do not realize it when all mechanisms work as expected. Once in a while though, our browsers stop us from visiting a page because of a certificate problem that we as users are asked to resolve: either we "accept the risk" or flee "back to safety". Validity checking is fully automated and the decision making is left to the user, easy enough.

However, looking at the average certificate we receive from email ports, the picture looks vastly different. Expired, self-signed and otherwise invalid certificates are quite common, so at this point we have to ask ourselves: what is so different? Why do we see invalid certificates so rarely for HTTPS but so often for email protocols like SMTP(S), POP3S and IMAPS?

Unfortunately, the email ecosystem is quite complex and requires different solutions, even though the problems sound familiar at first:

  1. Adoption of HTTPS was heavily pushed by large companies with a major impact on the modern web. Google, for example, signaled in 2014 that correct usage of HTTPS is a positive ranking factor for their search, and in 2018 shipped an update to their Chrome browser (and so did Mozilla, Apple and other Browser vendors) that would render all non-HTTPS pages as "not secure". We only partially see similar efforts for the email ecosystem.
  2. The regular use case of web traffic is that users request and interact with content provided by servers, so human to machine interaction. In case of an invalid certificate, we can fall back to asking the user how to proceed. For emails, we additionally have machine to machine communication when emails are transmitted from the sender's to the receiver's server. A certificate failure here needs to be resolved without human intervention.
  3. Email is still considered the backbone of modern business communication where not receiving emails from clients can have major impacts on businesses. Therefore, we have a major bias towards always delivering the email when being faced with a decision between robustness (delivering) and security (aborting).

 

While both ecosystems come with their own set of peculiarities, the above points give us a good indication why it makes sense to closer look at the intersection of email protocols and TLS.

Delivery of an Email

The following picture shows the a simplified three-step process of sending an emai

FAQ_broken_certificates_on_e-mail_eng2_SMALLER.jpg

 

  1. In the "submission" stage, the sender's mail user agent (MUA) transmits the email to the user's own email server (mail transmission agent, MTA) via an SMTP connection to either port 587 or 465. Here a human is still involved and the process is synchronous, i.e., it will succeed or fail and we will immediately receive the response.
  2. The sender's MTA now identifies the recipient's MTA and forwards the email on port 25 via SMTP. This is pure machine-to-machine communication where the user is out of the loop. From the sender's perspective, this is asynchronous: Once we submitted our message in step one, it is not guaranteed that the email is delivered immediately. Failures, if they are communicated at all, are delivered asynchronously. Popular example: sending to an email address that the receiving MTA does not recognize (due to typo maybe) will result in our MUA receiving a "bounce" message from out MTA that tells us the email could not be delivered.
  3. At some point after the email arrived at the receiver's MTA, the receiver's MUA pulls an updated list of emails and receives the new message. This is again asynchronous from a sender's perspective since we do not know if and when the MUA is active again. For receiving emails, we are typically using the IMAP(S), POP3(S) or Exchange ActiveSync (EAS) protocols.

 

So where do certificates come into play now?

Fair question, actually certificates are (or at least can be) involved in every single of those three steps, but in different ways:

In steps one and three, the MUA (client) initiates the connection to the MTA (server). Following the usual TLS flow, the servers present their certificates to the clients to allow them to (a) validate the authenticity of the server and (b) initiate an encrypted connection. Once again, with proper certificates in place, everything works as expected. If, however, an invalid certificate is presented, the MUA is prompting the user for a decision. To which extent leaving security decisions to end users is a good idea is a topic for another time, though.

In step two, MTAs are talking to each other without a human in the loop. The sender initiates a connection on SMTP port 25 to the receiver. If they are communicating over TLS, then the receiver will present their certificate as usual and we behave the same as in steps one and three. In theory, if we want to authenticate the sender as well, it could present a client certificate to the receiver. Sounds like a good plan to prevent spam from happening, right? However, solving this via TLS client certificates is not a common practice and the email ecosystem rather uses mechanisms like SPF (see my previous FAQ article here) instead.

What happens if we encounter an invalid certificate? The sender needs to make a decision, and terminating the connection implies not delivering the email, which is often considered as not an option (robustness fail). Ignoring the glancing security problems, however, would be a security failure but hold up robustness to misconfigurations. Keep in mind, though, that an invalid certificate could be a server misconfiguration or have been created by a currently active Man-in-the-Middle (MitM) attacker. So in practice mail servers typically favor robustness at the expense of security by either ignoring validation problems or even not checking the presented certificate at all. Here we have a real MitM problem, so let's see how it actually works:

Short intermezzo: Man-in-the-Middle attacks

The name Man-in-the-Middle comes from the visual picture of an attacker standing in the middle of a communication path between two benign parties. In the tradition of security research, let's call the communicating parties Alice and Bob, plus the evil attacker Eve. The following figure depicts a successful MitM attack on a TLS connection in case the certificate is not properly validated.

FAQ_broken_certificates_on_e-mail_eng_SMALLER.jpg

Alice wants to communicate with Bob at via a TLS-secured connection. For this example, let's assume Alice is a user that wants to send an email (MUA) and Bob is Alice's own mailserver (MTA). In order to authenticate the receiving side and bootstrap the encryption, Bob offers his Certificate B. If we now assume an active network attacker, Eve can intercept and either pass-through, modify or even drop each message between Alice and Bob. This is a common attacker model that has proven to be quite realistic (example: ISPs, vendors, middle boxes). So what happens is that instead of forwarding Bob's certificate Certificate B, Eve provides her own Certificate E to Alice. The goal is that Alice establishes a fully-secured connection to Eve using Certificate E and Eve establishes a fully-secured connection to Bob using Certificate B. In theory, both connections are properly secured by TLS and encrypted. While these connections thwart off passive attackers, so-called "eavesdroppers" that just record but do not modify traffic, for obvious reasons they are completely useless in defending against an active attacker like Eve. TLS is used for encryption during transit, hence at each hop the content is available in plain text. Eve can read and modify any content, such as emails in our example.

At this point, certificate validation could save the day. The assumption is that only Bob has valid certificates for the mailserver Alice wants to use, because only Bob can prove control over the server (to be more precise: the server's domain). Therefore, Eve can only provide an invalid certificate, one she signed herself for example. If Alice is properly validating the received certificate, she will stop the communication because Certificate E is not valid for Bob's server and domain she wants to talk to.

Now imagine MUAs are typically not validating certificates. If even Bob as the legitimate receiver provides an invalid certificate (maybe because few clients are validating anyway), Alice absolutely needs to accept the invalid certificate or otherwise they cannot communicate. And in the world of emails, this implies not transmitting a potentially important email.

Disabling certificate validation, however, will allow Eve to fully erode secrecy, integrity and authenticity guarantees that we typically expect from TLS connections.

Ok so proper certificates and we are good, right?

Kind of... the thing about security is that we are never done, it's all about raising the bar. Let's see where we stand now:

 

  • Using TLS with invalid certificates: protects against passive eavesdroppers
  • Using TLS with correct and validated certificates: additionally protects against active attackers providing their own, invalid certificates

 

Great! If we look at TLS in isolation, we are done here. Unfortunately, the email ecosystem is a bit more complicated than that.

Let's take SMTP as an example, but but the same holds for IMAP and POP3. If we establish a TLS connection from the very start, things are indeed looking good. This approach we call mandatory or implicit TLS. However, email protocols were created as plaintext protocols in the first place and their ports were reserved for exactly that. An attempted solution to keep using the same protocols on the same ports was to introduce the STARTTLS command. This allows to signal the other side that a communication partner wants to switch to a secure channel. Afterwards, client and server negotiate the details about which versions and configurations they support each to finally settle on a common set of parameters. This effectively allows email servers to adopt TLS in an compatible way by still accepting plaintext connections and optionally allowing an upgrade to TLS. This is called opportunistic TLS. If and only if both parties want to communicate via TLS and can agree on a common set of security parameters, then they switch to a secure channel. Otherwise, the connection stays plaintext, once again choosing robustness over security.

The weakness of this approach is that the request to switch to a secure channel is transmitted in plaintext. If we now revisit our active attacker model, we see that Eve can intercept, modify and drop messages arbitrarily.

When alice sends a STARTTLS command to Bob, Eve can use a so-called "downgrade attack" to modify Bob's response either to say that Bob does not support TLS at all, or Even can intervene with Alice's and Bob's negotiation of security parameters and fake a situation where both do not share a set of common parameters. If, for example, Eve pretends Alice only supports TLS 1.2 and Bob only supports TLS 1.3, then the negotiation fails because a secure connection cannot be established and we fall back to plaintext communication.

Protecting against downgrade attacks

Two mechanisms have been commonly suggested to solve this problem. First, a standard called SMTP Strict Transport Security (short: MTA-STS) allows a domain owner (i.e., Bob) to publicly state his preferences concerning TLS connections. A DNS TXT record for the domain points to an HTTPS URL with a policy file, where Bob can state that they for example only want encrypted connections for a certain set of (sub-)domains. Since the policy also includes a time-to-live, the client can keep this information around for a while so it does not have to be requested every single time.

This already improves upon the current situation, because clients that support MTA-STS will understand that Bob supports mandatory TLS (and might even enforce it) and can directly initiate a TLS connection, hence defending against downgrade attacks. However, there are two major downsides here:

 

  1. The client needs to implement and follow this mechanism. It's completely optional for Bob to publish a policy and for Alice to implement support for it.
  2. Talking about an active network attacker, Eve can unfortunately also intercept and modify the DNS response and, e.g., pretend there is no policy defined. This will only work if the client does not have a cached policy from the past, effectively making this a TOFU (trust-on-first-use) mechanism: if Alice's first connection to Bob is legitimate, Eve will not be able to intercept future connections because Alice will remember the policy for a while. If, however, Eve already intercepts the first connection between Alice and Bob, then the mechanism is not helping us here. Hence the name, trust-on-*first*-use: the first connection sets the stage.

 

Good news is, we can do better than that. DANE is a security mechanism that allows server to publicly announce their TLS certificates. To avoid the TOFU problem we have for MTA-HSTS, it utilized DNSSEC, a set of DNS extensions that allows to verify that DNS traffic originates from the correct source (authenticity) and has not been tampered with (integrity).

So why are we still talking about security problems in email communication if we have DANE? The truth is that DNSSEC is still not commonly used, so we cannot rely on its presence at internet scale. It is often considered too complex and error-prone, which leaves us with a patchwork of mechanisms nowadays, from STARTTLS to MTA-STS and in rare cases DANE.

Conclusion

Securing email communication is hard. The ecosystem is historically grown, yet at the same time crucial to modern day business communication and cannot easily be replaced as a whole. Due to its importance, we still see robustness concerns held higher than security requirements, although apparently things are changing for the better.

In the end, the first step is always understanding the problem, and security is always about the details, so I hope that this article can help a bit on that front.

What we can do as users is to configure (mandatory) TLS instead of STARTTLS in our email clients and to double-check when certificate errors appear.

On the infrastructure side, the stronger the implemented security mechanisms, the higher the chance to defend against attackers. While DANE is a great solution on paper, pragmatically one should probably implement MTA-STS as well to provide as many avenues for clients to increase their security as possible. After all, if clients do support multiple mechanisms, we can hopefully assume that they will pick the strongest one.

Bonus FAQ

Why are you writing about this?

Our SaaS solution Findalyze is actively identifying and reporting invalid certificates. While we explain the reasoning behind our findings in the app itself, putting everything in the large context of email security mechanisms and attacker models is outside the scope of Findalyze and better done in form of an article like this one. Bonus points: we now have a resource we can link to whenever needed ;-)

What's all this madness with port 465?

Port 465 has an interesting and, at times, confusing history. I strongly recommend reading the corresponding RFC paragraph that explains why 465 was originally reserved for SMTP + mandatory TLS, then freed up again because STARTTLS was considered the way forward, and then registered again because (a) popular email clients were still using it for submission and (b) mandatory TLS provides stronger guarantees than STARTTLS, as outlined above.

So nowadays we have 465 for SMTP via mandatory TLS and 587 typically used for both, plaintext SMTP with STARTTLS support as well as support for a direct TLS connections. Same goes for port 25 that handles the transmission of emails between MTAs because it supports both, plaintext and TLS. I assume we will still see support and usage of all three ports, at least for a while.

How useful really is certificate validation in machine-to-machine communication?

There is an interesting discussion thread from 2018 where the expansion of Let's encrypt into the email ecosystem was discussed. There was a strong opinion by one participant that for machine-to-machine (i.e., MTA-to-MTA) communication, validating the chain of trust for TLS certificates does not make any sense at all because no trusted anchor was established officially. I agree with the author that DANE is the "proper" solution, but the low adoption of DNSSEC is preventing us from truly using it at scale at the moment.

I encourage everyone to read the thread and form their own opinion. Here is mine:

Indeed, both RFC 8314 and RFC 7817 define certificate validation rules specifically for MUAs, i.e., email clients talking to email servers. Machine-to-machine communication without a human in the loop does not seem to be regulated here. Still, I personally do not buy the "Requiring a signed certificate simply can not work because there is not official list of trusted certificate authorities" argument. We already have a set of CAs that we trust for HTTPS connections. In automated cases like Let's encrypt, you have to prove control over the domain you request a certificate for. The same mechanism would fit our email use case, proving with a CA that we control a certain domain should be enough to receive a valid certificate for it, be it for HTTPS or for SMTP over TLS. True, the CA system has their own host of flaws, but once again, a distributed solution like DANE via DNSSEC is too far from widespread adoption to be a reasonable required technology.

Wasn't there the idea of an MTA-STS preload list to get rid of TOFU?

Exactly, the STARTTLS Everywhere project started with the noble goal to help spread the adoption of the STARTTLS option for opportunistic TLS in 2014. One initiative was the creation of a centralized preload list, as we have it for HSTS in the web ecosystem, so that maintainers can register their email servers to support certain policies. Think MTA-STS but using a locally downloaded, fixed list instead of a live DNS request.

While this would solve TOFU for those servers being part of the list, centralized preloading lists tend to have their own host of problems.

Eventually, the project winded down support for the preload list and stopped receiving request to be included in the list due to its low adaption.