Email Basics: Introduction to SMTP

Note: This is part 1 of a two-part segment on Email Basics. You can also read part 2 of this segment, which deals with the internal format of emails.

With all of the government spying scares circulating recently, many people have started taking another look at the companies and people responsible for their email. Like most technology today, email is easy to use, but difficult to understand, and despite email’s widespread adoption, most people only have the vague notion of massive computers routing messages over some obscure protocol. It seems to be getting harder and harder to trust a third party with something as personal and sensitive as your email account, but a lot of this mistrust comes from a fundamental lack of understanding of how email works.

Email runs parallel and independently of the World Wide Web, or what we’ve come to think of as the Internet. In reality, websites only make up one of many different application-level protocols that make up the whole of all Internet traffic. Email is actually made up of 2 separate protocols that work together, but were designed to be able to operate independently. These are: SMTP, which describes how email is passed along to its destination, and RFC822, which describes the abilities and format of email. These two are Internet Standards, published by the IETF (Internet Engineering Task Force), and anybody who wants to be able to send or receive email follows them.

The SMTP protocol is defined in a document known as RFC821 (which is right before RFC822). I left RFC822 for part 2 of this blog series and will focus on describing SMTP here.

SMTP is responsible for ensuring delivery of a package. It couldn’t care less about what that package contained or how it was structured, so long as it was delivered. SMTP runs on the TCP protocol, which is also used for things like websites. The TCP protocol is able to distinguish between somebody requesting a website and something sending an email by assigning to each of its applications one or more port numbers. SMTP has been assigned port 25.

The nifty thing about TCP is that provides a lot of things for free and takes care of all the messy business. Some of these things include:

Sending messages of any size
Guaranteeing that everything you send will be delivered
Providing 2-way reliable communication without stuff getting messed up in the middle

For this reason, a lot of Internet stuff is built on top of TCP to take advantage of all the existing infrastructure that supports it. So what goes through TCP? Well, just about anything. You can send anonymous encouragement like so:

   You: $(nc g.co 80)
   You: Hey you! You're the best!
Server: ...

You can receive messages back in the same way. In this example, there is a distinction between the client (You) and the server (computer that receives your messages). However, TCP is completely symmetric once the initial connection is established: both you and the server can send data, receive data, and close the connection.

The SMTP Conversation

Package delivery through SMTP works like a conversation. This conversation takes place between a client, who has a package to deliver, and a server, who is receiving the package. Once the package is received by the server, the client can forget about it and the server assumes full responsibility for the package and its delivery. In this way, servers sometimes become clients themselves if they need to forward packages on to other servers.

Like any good conversation, SMTP begins with an introduction by both parties:

Server: 220 localhost Greetings
   You: HELO roger
Server: localhost

In case you’ve got bad memory, the server gives you its name twice. Once when you open the connection, and again after you introduce yourself with a HELO. The names used here are special names known as hostnames. In real-world SMTP, these hostnames are usually fully-qualified domain names (FQDN’s) and look like mail.rogerhub.com.

You’ve both said helo. Now we get down to business. The usual thing to do at this point is to tell the server about the package you’re delivering. Of course, if you’re shy you can just walk away.

   You: QUIT
Server: 221 Service closing transmission channel

To declare that you’ve got a package to deliver, the MAIL FROM command is used. The command is followed with a single colon and then the email address of the sender.

   You: MAIL FROM:Finn the Human <finn@treehouse>
Server: 250 OK

Next, you declare the recipients of the package with RCPT TO. If there are more than one, which happens often, then you declare each recipient one at a time.

   You: RCPT TO:Jake the Dog <jake@treehouse>
Server: 250 OK
   You: RCPT TO:BMO <bmo@treehouse>
Server: 250 OK

So far so good. We are not far from the end. Once the formalities are exchanged, you can begin delivering the package with the DATA command. After you’ve delivered the whole package, you tell the server that you’re done by sending a single period on its own line.

   You: DATA
Server: 354 Start mail input; end with <CRLF>.<CRLF>
   You: Jake,

        Left the house for a few hours. Will be back
        soon.

        Not joking,
        Finn
        .
Server: 250 OK

That’s it! At this point, you can say goodbye with QUIT or send another email. (These command responses are straight from a demo SMTP server implementation on my GitHub.)

But this can’t be all there is! Is there more to SMTP? You bet there is!

In fact, SMTP and RFC822 have both seen many revisions to the specification over the years. One of the major changes to the specification is in the initial greeting. Instead of HELO, an alternate greeting EHLO, or extended helo, was proposed. When you greet an EHLO-aware server with EHLO, it will send you an extended SMTP (or ESMTP) response like so:

Server: 220 localhost Greetings
   You: EHLO roger
Server: 250-localhost
Server: 250-PIPELINING
Server: 250 SIZE 10000000

The server sends a list of SMTP extensions that it supports. These include things like STARTTLS, which provides a way to upgrade a plaintext connection (like all the transmissions seen here) to an anonymous, encrypted TLS session, which is more resilient to snooping. Both the client and the server have to support an extension before you can use it. Extensions open up the path to a some extra-cool stuff like authentication and UTF-8 support.

You can try out this whole process by sending an email to a friend, just like a real mail server would do. The first step is to find the IP address of the server to contact! The hostname of your request is the hostname of the email address (e.g. gmail.com), and the query type is mx, which stands for Mail Exchanger.

$ nslookup -q=mx gmail.com
...

After that, you just need to open a TCP connection to one of the listed Mail Exchangers. You can do this with either netcat or telnet, both of whose availability depend on your operating system.

$ nc example.com 25       # netcat
$ telnet example.com 25   # telnet

Fire away! (But don’t be surprised if your phony emails are discarded as spam.. Haha)